Viet-Anh on Software Logo

What is: Relative Position Encodings?

SourceSelf-Attention with Relative Position Representations
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Relative Position Encodings are a type of position embeddings for Transformer-based models that attempts to exploit pairwise, relative positional information. Relative positional information is supplied to the model on two levels: values and keys. This becomes apparent in the two modified self-attention equations shown below. First, relative positional information is supplied to the model as an additional component to the keys

e_ij=x_iWQ(x_jWK+aK_ij)Td_ze\_{ij} = \frac{x\_{i}W^{Q}\left(x\_{j}W^{K} + a^{K}\_{ij}\right)^{T}}{\sqrt{d\_{z}}}

Here aa is an edge representation for the inputs x_ix\_{i} and x_jx\_{j}. The softmax operation remains unchanged from vanilla self-attention. Then relative positional information is supplied again as a sub-component of the values matrix:

z_i=n_j=1α_ij(x_jWV+a_ijV) z\_{i} = \sum^{n}\_{j=1}\alpha\_{ij}\left(x\_{j}W^{V} + a\_{ij}^{V}\right)

In other words, instead of simply combining semantic embeddings with absolute positional ones, relative positional information is added to keys and values on the fly during attention calculation.

Source: Jake Tae

Image Source: [Relative Positional Encoding for Transformers with Linear Complexity](https://www.youtube.com/watch?v=qajudaEHuq8