Viet-Anh on Software Logo

What is: Positional Encoding Generator?

SourceConditional Positional Encodings for Vision Transformers
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Positional Encoding Generator, or PEG, is a module used in the Conditional Position Encoding position embeddings. It dynamically produce the positional encodings conditioned on the local neighborhood of an input token. To condition on the local neighbors, we first reshape the flattened input sequence XRB×N×CX \in \mathbb{R}^{B \times N \times C} of DeiT back to XRB×H×W×CX^{\prime} \in \mathbb{R}^{B \times H \times W \times C} in the 2 -D image space. Then, a function (denoted by F\mathcal{F} in the Figure) is repeatedly applied to the local patch in XX^{\prime} to produce the conditional positional encodings EB×H×W×C.E^{B \times H \times W \times C} . PEG can be efficiently implemented with a 2-D convolution with kernel k(k3)k(k \geq 3) and k12\frac{k-1}{2} zero paddings. Note that the zero paddings here are important to make the model be aware of the absolute positions, and F\mathcal{F} can be of various forms such as separable convolutions and many others.