**Twins-SVT** is a type of [vision transformer](https://paperswithcode.com/methods/category/vision-transformer) which utilizes a [spatially separable attention mechanism](https://paperswithcode.com/method/spatially-separable-self-attention) (SSAM) which is composed of two types of attention operations—(i) locally-grouped self-attention (LSA), and (ii) global sub-sampled attention (GSA), where LSA captures the fine-grained and short-distance information and GSA deals with the long-distance and global information. On top of this, it utilizes [conditional position encodings](https://paperswithcode.com/method/conditional-positional-encoding) as well as the architectural design of the [Pyramid Vision Transformer](https://paperswithcode.com/method/pvt).

**Structurally Regularized Deep Clustering**, or **SRDC**, is a deep network based discriminative clustering method for domain adaptation that minimizes the KL divergence between predictive label distribution of the network and an introduced auxiliary one. Replacing the auxiliary distribution with that formed by ground-truth labels of source data implements the structural source regularization via a simple strategy of joint network training.

SRDC

Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering

Twins-SVT

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

**MiVOS** is a video object segmentation model which decouples interaction-to-mask and mask propagation. By decoupling interaction from propagation, MiVOS is versatile and not limited by the type of interactions. It uses three modules: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory.

Source	Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Twins-SVT?

Viet-Anh on Software