Viet-Anh on Software Logo

What is: Absolute Position Encodings?

SourceAttention Is All You Need
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Absolute Position Encodings are a type of position embeddings for [Transformer-based models] where positional encodings are added to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension d_modeld\_{model} as the embeddings, so that the two can be summed. In the original implementation, sine and cosine functions of different frequencies are used:

PE(pos,2i)=sin(pos/100002i/d_model)\text{PE}\left(pos, 2i\right) = \sin\left(pos/10000^{2i/d\_{model}}\right)

PE(pos,2i+1)=cos(pos/100002i/d_model)\text{PE}\left(pos, 2i+1\right) = \cos\left(pos/10000^{2i/d\_{model}}\right)

where pospos is the position and ii is the dimension. That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from 2π2\pi to 100002˙π10000 \dot 2\pi. This function was chosen because the authors hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset kk, PE_pos+k\text{PE}\_{pos+k} can be represented as a linear function of PE_pos\text{PE}\_{pos}.

Image Source: D2L.ai