Viet-Anh on Software Logo

What is: Twins-SVT?

SourceTwins: Revisiting the Design of Spatial Attention in Vision Transformers
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Twins-SVT is a type of vision transformer which utilizes a spatially separable attention mechanism (SSAM) which is composed of two types of attention operations—(i) locally-grouped self-attention (LSA), and (ii) global sub-sampled attention (GSA), where LSA captures the fine-grained and short-distance information and GSA deals with the long-distance and global information. On top of this, it utilizes conditional position encodings as well as the architectural design of the Pyramid Vision Transformer.