What is: Gated Transformer-XL?
Source | Stabilizing Transformers for Reinforcement Learning |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Gated Transformer-XL, or GTrXL, is a Transformer-based architecture for reinforcement learning. It introduces architectural modifications that improve the stability and learning speed of the original Transformer and XL variant. Changes include:
- Placing the layer normalization on only the input stream of the submodules. A key benefit to this reordering is that it now enables an identity map from the input of the transformer at the first layer to the output of the transformer after the last layer. This is in contrast to the canonical transformer, where there are a series of layer normalization operations that non-linearly transform the state encoding.
- Replacing residual connections with gating layers. The authors' experiments found that GRUs were the most effective form of gating.