Viet-Anh on Software Logo

What is: Conditional Positional Encoding?

SourceConditional Positional Encodings for Vision Transformers
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Conditional Positional Encoding, or CPE, is a type of positional encoding for vision transformers. Unlike previous fixed or learnable positional encodings, which are predefined and independent of input tokens, CPE is dynamically generated and conditioned on the local neighborhood of the input tokens. As a result, CPE aims to generalize to the input sequences that are longer than what the model has ever seen during training. CPE can also keep the desired translation-invariance in the image classification task. CPE can be implemented with a Position Encoding Generator (PEG) and incorporated into the current Transformer framework.