Viet-Anh on Software Logo

What is: SortCut Sinkhorn Attention?

SourceSparse Sinkhorn Attention
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

SortCut Sinkhorn Attention is a variant of Sparse Sinkhorn Attention where a post-sorting truncation of the input sequence is performed, essentially performing a hard top-k operation on the input sequence blocks within the computational graph. While most attention models mainly re-weight or assign near-zero weights during training, this allows for explicitly and dynamically truncate the input sequence. Specifically:

Y=Softmax(Qψ_S(K)T_[:n])ψ_S(V)_[:n]Y = \text{Softmax}\left(Q{\psi\_{S}}\left(K\right)^{T}\_{\left[:n\right]}\right)\psi\_{S}\left(V\right)\_{\left[:n\right]}

where nn is the Sortfut budget hyperparameter.