What is: SortCut Sinkhorn Attention?
Source | Sparse Sinkhorn Attention |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
SortCut Sinkhorn Attention is a variant of Sparse Sinkhorn Attention where a post-sorting truncation of the input sequence is performed, essentially performing a hard top-k operation on the input sequence blocks within the computational graph. While most attention models mainly re-weight or assign near-zero weights during training, this allows for explicitly and dynamically truncate the input sequence. Specifically:
where is the Sortfut budget hyperparameter.