What is: Sandwich Transformer?
Source | Improving Transformer Models by Reordering their Sublayers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
A Sandwich Transformer is a variant of a Transformer that reorders sublayers in the architecture to achieve better performance. The reordering is based on the authors' analysis that models with more self-attention toward the bottom and more feedforward sublayers toward the top tend to perform better in general.