What is: Subformer?
Source | Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Subformer is a Transformer that combines sandwich-style parameter sharing, which overcomes naive cross-layer parameter sharing in generative models, and self-attentive embedding factorization (SAFE). In SAFE, a small self-attention layer is used to reduce embedding parameter count.