What is: PermuteFormer?
Source | PermuteFormer: Efficient Relative Position Encoding for Long Sequences |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
PermuteFormer is a Performer-based model with relative position encoding that scales linearly on long sequences. PermuteFormer applies position-dependent transformation on queries and keys to encode positional information into the attention module. This transformation is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens.
Each token’s query / key feature is illustrated as a row of blocks in the figure, and its elements are marked with different colors. The position-aware permutation permutes elements of each token’s query / key feature along the head size dimension in each attention head. Depending on the token’s position, the permutation applied to query / key feature is different.