What is: Global-Local Attention?
Source | ETC: Encoding Long and Structured Inputs in Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Global-Local Attention is a type of attention mechanism used in the ETC architecture. ETC receives two separate input sequences: the global input and the long input . Typically, the long input contains the input a standard Transformer would receive, while the global input contains a much smaller number of auxiliary tokens (). Attention is then split into four separate pieces: global-to-global (g2g), global-tolong (g2l), long-to-global (l2g), and long-to-long (l2l). Attention in the l2l piece (the most computationally expensive piece) is restricted to a fixed radius . To compensate for this limited attention span, the tokens in the global input have unrestricted attention, and thus long input tokens can transfer information to each other through global input tokens. Accordingly, g2g, g2l, and l2g pieces of attention are unrestricted.