What is: Routing Transformer?
Source | Efficient Content-Based Sparse Attention with Routing Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
The Routing Transformer is a Transformer that endows self-attention with a sparse routing module based on online k-means. Each attention module considers a clustering of the space: the current timestep only attends to context belonging to the same cluster. In other word, the current time-step query is routed to a limited number of context through its cluster assignment.