Viet-Anh on Software Logo

What is: Gated Transformer-XL?

SourceStabilizing Transformers for Reinforcement Learning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Gated Transformer-XL, or GTrXL, is a Transformer-based architecture for reinforcement learning. It introduces architectural modifications that improve the stability and learning speed of the original Transformer and XL variant. Changes include:

  • Placing the layer normalization on only the input stream of the submodules. A key benefit to this reordering is that it now enables an identity map from the input of the transformer at the first layer to the output of the transformer after the last layer. This is in contrast to the canonical transformer, where there are a series of layer normalization operations that non-linearly transform the state encoding.
  • Replacing residual connections with gating layers. The authors' experiments found that GRUs were the most effective form of gating.