Viet-Anh on Software Logo

What is: PermuteFormer?

SourcePermuteFormer: Efficient Relative Position Encoding for Long Sequences
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

PermuteFormer is a Performer-based model with relative position encoding that scales linearly on long sequences. PermuteFormer applies position-dependent transformation on queries and keys to encode positional information into the attention module. This transformation is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens.

Each token’s query / key feature is illustrated as a row of blocks in the figure, and its elements are marked with different colors. The position-aware permutation permutes elements of each token’s query / key feature along the head size dimension in each attention head. Depending on the token’s position, the permutation applied to query / key feature is different.