What is: Sparse Sinkhorn Attention?
Source | Sparse Sinkhorn Attention |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Sparse Sinkhorn Attention is an attention mechanism that reduces the memory complexity of the dot-product attention mechanism and is capable of learning sparse attention outputs. It is based on the idea of differentiable sorting of internal representations within the self-attention module. SSA incorporates a meta sorting network that learns to rearrange and sort input sequences. Sinkhorn normalization is used to normalize the rows and columns of the sorting matrix. The actual SSA attention mechanism then acts on the block sorted sequences.