What is: T-Fixup?
| Source | Improving Transformer Optimization Through Better Initialization | 
| Year | 2000 | 
| Data Source | CC BY-SA - https://paperswithcode.com | 
T-Fixup is an initialization method for Transformers that aims to remove the need for layer normalization and warmup. The initialization procedure is as follows:
- Apply Xavier initialization for all parameters excluding input embeddings. Use Gaussian initialization for input embeddings where is the embedding dimension.
- Scale and matrices in each decoder attention block, weight matrices in each decoder MLP block and input embeddings and in encoder and decoder by
- Scale and matrices in each encoder attention block and weight matrices in each encoder MLP block by
