Viet-Anh on Software Logo

What is: Global-Local Attention?

SourceETC: Encoding Long and Structured Inputs in Transformers
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Global-Local Attention is a type of attention mechanism used in the ETC architecture. ETC receives two separate input sequences: the global input xg=(xg_1,,xg_n_g)x^{g} = (x^{g}\_{1}, \dots, x^{g}\_{n\_{g}}) and the long input xl=(xl_1,xl_n_l)x^{l} = (x^{l}\_{1}, \dots x^{l}\_{n\_{l}}). Typically, the long input contains the input a standard Transformer would receive, while the global input contains a much smaller number of auxiliary tokens (n_gn_ln\_{g} \ll n\_{l}). Attention is then split into four separate pieces: global-to-global (g2g), global-tolong (g2l), long-to-global (l2g), and long-to-long (l2l). Attention in the l2l piece (the most computationally expensive piece) is restricted to a fixed radius rn_lr \ll n\_{l}. To compensate for this limited attention span, the tokens in the global input have unrestricted attention, and thus long input tokens can transfer information to each other through global input tokens. Accordingly, g2g, g2l, and l2g pieces of attention are unrestricted.