🪴 Anil's Garden

Home

❯

Clippings

❯

Layer Normalization Explained Papers With Code

18 Jul 20252 min read

clippings

Layer Normalization Explained | Papers With Code

Excerpt

Unlike batch normalization, Layer Normalization directly estimates the normalization statistics from the summed inputs to the neurons within a hidden layer so the normalization does not introduce any new dependencies between training cases. It works well for RNNs and improves both the training time and the generalization performance of several existing RNN models. More recently, it has been used with Transformer models.

We compute the layer normalization statistics over all the hidden units in the same layer as follows:

$\mu^ = \frac\sum^_a_^$

$\sigma^ = \sqrt\sum^_\left(a_^-\mu^\right)^}$

where $H$ denotes the number of hidden units in a layer. Under layer normalization, all the hidden units in a layer share the same normalization terms $μ$ and $σ$ , but different training cases have different normalization terms. Unlike batch normalization, layer normalization does not impose any constraint on the size of the mini-batch and it can be used in the pure online regime with batch size 1.

Sehoon Kim, Michael W. Mahoney, Kurt Keutzer, Zhewei Yao, Amir Gholami

5 Jan 2021

126,724

Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara, Shamane Siriwardhana

22 Jun 2021

126,719

Albert E. Shaw, Kurt W. Keutzer, Ravi Krishna, Forrest N. Iandola

19 Jun 2020

126,707

Douwe Kiela, Naman Goyal, Ethan Perez, Aleksandra Piktus, Sebastian Riedel, Patrick Lewis, Fabio Petroni, Vladimir Karpukhin, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Heinrich Küttler

22 May 2020

126,672

Katherine Lee, Noam Shazeer, Sharan Narang, Colin Raffel, Wei Li, Yanqi Zhou, Michael Matena, Peter J. Liu, Adam Roberts

23 Oct 2019

126,672

Rajib Rana, Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara, Shamane Siriwardhana, Tharindu Kaluarachchi

6 Oct 2022

126,672

Guillaume Lample, Alexis Conneau

22 Jan 2019

126,672

Aliaksei Severyn, Shashi Narayan, Sascha Rothe

29 Jul 2019

126,672

Tim Salimans, Karthik Narasimhan, Alec Radford, Ilya Sutskever

11 Jun 2018

126,672

Kyra Yee, Nathan Ng, Myle Ott, Alexei Baevski, Michael Auli, Sergey Edunov

15 Jul 2019

126,672

Graph View

Layer Normalization Explained | Papers With Code
Excerpt

Backlinks

No backlinks found

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

Layer Normalization Explained Papers With Code

Layer Normalization Explained | Papers With Code

Excerpt

Graph View

Table of Contents

Backlinks