What is: BigBird?
| Source | Big Bird: Transformers for Longer Sequences | 
| Year | 2000 | 
| Data Source | CC BY-SA - https://paperswithcode.com | 
BigBird is a Transformer with a sparse attention mechanism that reduces the quadratic dependency of self-attention to linear in the number of tokens. BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. In particular, BigBird consists of three main parts:
- A set of global tokens attending on all parts of the sequence.
- All tokens attending to a set of local neighboring tokens.
- All tokens attending to a set of random tokens.
This leads to a high performing attention mechanism scaling to much longer sequence lengths (8x).
