What is: BigBird?
Source | Big Bird: Transformers for Longer Sequences |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
BigBird is a Transformer with a sparse attention mechanism that reduces the quadratic dependency of self-attention to linear in the number of tokens. BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. In particular, BigBird consists of three main parts:
- A set of global tokens attending on all parts of the sequence.
- All tokens attending to a set of local neighboring tokens.
- All tokens attending to a set of random tokens.
This leads to a high performing attention mechanism scaling to much longer sequence lengths (8x).