Layer Normalization Explained | Papers With Code


Unlike batch normalization, Layer Normalization directly estimates the normalization statistics from the summed inputs to the neurons within a hidden layer so the normalization does not introduce any new dependencies between training cases. It works well for RNNs and improves both the training time and the generalization performance of several existing RNN models. More recently, it has been used with Transformer models.

We compute the layer normalization statistics over all the hidden units in the same layer as follows:

\mu^ = \frac\sum^_a_^

\sigma^ = \sqrt\sum^_\left(a_^-\mu^\right)^}

where denotes the number of hidden units in a layer. Under layer normalization, all the hidden units in a layer share the same normalization terms and , but different training cases have different normalization terms. Unlike batch normalization, layer normalization does not impose any constraint on the size of the mini-batch and it can be used in the pure online regime with batch size 1.

Sehoon Kim, Michael W. Mahoney, Kurt Keutzer, Zhewei Yao, Amir Gholami

5 Jan 2021


Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara, Shamane Siriwardhana

22 Jun 2021


Albert E. Shaw, Kurt W. Keutzer, Ravi Krishna, Forrest N. Iandola

19 Jun 2020


Douwe Kiela, Naman Goyal, Ethan Perez, Aleksandra Piktus, Sebastian Riedel, Patrick Lewis, Fabio Petroni, Vladimir Karpukhin, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Heinrich Küttler

22 May 2020


Katherine Lee, Noam Shazeer, Sharan Narang, Colin Raffel, Wei Li, Yanqi Zhou, Michael Matena, Peter J. Liu, Adam Roberts

23 Oct 2019


Rajib Rana, Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara, Shamane Siriwardhana, Tharindu Kaluarachchi

6 Oct 2022


Guillaume Lample, Alexis Conneau

22 Jan 2019


Aliaksei Severyn, Shashi Narayan, Sascha Rothe

29 Jul 2019


Tim Salimans, Karthik Narasimhan, Alec Radford, Ilya Sutskever

11 Jun 2018


Kyra Yee, Nathan Ng, Myle Ott, Alexei Baevski, Michael Auli, Sergey Edunov

15 Jul 2019
