Viet-Anh on Software Logo

What is: Routing Attention?

SourceEfficient Content-Based Sparse Attention with Routing Transformers
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Routed Attention is an attention pattern proposed as part of the Routing Transformer architecture. Each attention module considers a clustering of the space: the current timestep only attends to context belonging to the same cluster. In other word, the current time-step query is routed to a limited number of context through its cluster assignment. This can be contrasted with strided attention patterns and those proposed with the Sparse Transformer.

In the image to the right, the rows represent the outputs while the columns represent the inputs. The different colors represent cluster memberships for the output token.