Layer Normalization Explained | Papers With Code
Excerpt
Unlike batch normalization, Layer Normalization directly estimates the normalization statistics from the summed inputs to the neurons within a hidden layer so the normalization does not introduce any new dependencies between training cases. It works well for RNNs and improves both the training time and the generalization performance of several existing RNN models. More recently, it has been used with Transformer models.
We compute the layer normalization statistics over all the hidden units in the same layer as follows:
\mu^ = \frac\sum^_a_^
\sigma^ = \sqrt\sum^_\left(a_^-\mu^\right)^}
where denotes the number of hidden units in a layer. Under layer normalization, all the hidden units in a layer share the same normalization terms and , but different training cases have different normalization terms. Unlike batch normalization, layer normalization does not impose any constraint on the size of the mini-batch and it can be used in the pure online regime with batch size 1.
Sehoon Kim, Michael W. Mahoney, Kurt Keutzer, Zhewei Yao, Amir Gholami
5 Jan 2021
126,724
Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara, Shamane Siriwardhana
22 Jun 2021
126,719
Albert E. Shaw, Kurt W. Keutzer, Ravi Krishna, Forrest N. Iandola
19 Jun 2020
126,707
Douwe Kiela, Naman Goyal, Ethan Perez, Aleksandra Piktus, Sebastian Riedel, Patrick Lewis, Fabio Petroni, Vladimir Karpukhin, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Heinrich Küttler
22 May 2020
126,672
Katherine Lee, Noam Shazeer, Sharan Narang, Colin Raffel, Wei Li, Yanqi Zhou, Michael Matena, Peter J. Liu, Adam Roberts
23 Oct 2019
126,672
Rajib Rana, Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara, Shamane Siriwardhana, Tharindu Kaluarachchi
6 Oct 2022
126,672
Guillaume Lample, Alexis Conneau
22 Jan 2019
126,672
Aliaksei Severyn, Shashi Narayan, Sascha Rothe
29 Jul 2019
126,672
Tim Salimans, Karthik Narasimhan, Alec Radford, Ilya Sutskever
11 Jun 2018
126,672
Kyra Yee, Nathan Ng, Myle Ott, Alexei Baevski, Michael Auli, Sergey Edunov
15 Jul 2019
126,672