Viet-Anh on Software Logo

What is: NormFormer?

SourceNormFormer: Improved Transformer Pretraining with Extra Normalization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

NormFormer is a type of Pre-LN transformer that adds three normalization operations to each layer: a Layer Norm after self attention, head-wise scaling of self-attention outputs, and a Layer Norm after the first fully connected layer. The modifications introduce a small number of additional learnable parameters, which provide a cost-effective way for each layer to change the magnitude of its features, and therefore the magnitude of the gradients to subsequent components.