Viet-Anh on Software Logo

What is: OODformer?

SourceOODformer: Out-Of-Distribution Detection Transformer
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

OODformer is a transformer-based OOD detection architecture that leverages the contextualization capabilities of the transformer. Incorporating the transformer as the principal feature extractor allows to exploit the object concepts and their discriminate attributes along with their co-occurrence via visual attention.

OODformer employs ViT and its data efficient variant DeiT. Each encoder layer consist of multi-head self attention and a multi-layer perception block. The combination of MSA and MLP layers in the encoder jointly encode the attributes' importance, associated correlation, and co-occurrence. The [class] token (a representative of an image xx) consolidated multiple attributes and their related features via the global context. The [class] token from the final layer is used for OOD detection in two ways; first, it is passed to Fclassifier (xfeat ) F_{\text {classifier }}\left(x_{\text {feat }}\right) for softmax confidence score, and second it is used for latent space distance calculation.