What is: Routing Attention?
Source | Efficient Content-Based Sparse Attention with Routing Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Routed Attention is an attention pattern proposed as part of the Routing Transformer architecture. Each attention module considers a clustering of the space: the current timestep only attends to context belonging to the same cluster. In other word, the current time-step query is routed to a limited number of context through its cluster assignment. This can be contrasted with strided attention patterns and those proposed with the Sparse Transformer.
In the image to the right, the rows represent the outputs while the columns represent the inputs. The different colors represent cluster memberships for the output token.