What is: Twins-SVT?
Source | Twins: Revisiting the Design of Spatial Attention in Vision Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Twins-SVT is a type of vision transformer which utilizes a spatially separable attention mechanism (SSAM) which is composed of two types of attention operations—(i) locally-grouped self-attention (LSA), and (ii) global sub-sampled attention (GSA), where LSA captures the fine-grained and short-distance information and GSA deals with the long-distance and global information. On top of this, it utilizes conditional position encodings as well as the architectural design of the Pyramid Vision Transformer.