What is: Grouped-query attention?
Source | GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Grouped-query attention an interpolation of multi-query and multi-head attention that achieves quality close to multi-head at comparable speed to multi-query attention.