The **Cross-Attention** module is an attention module used in [CrossViT](https://paperswithcode.com/method/crossvit) for fusion of multi-scale features. The CLS token of the large branch (circle) serves as a query token to interact with the patch tokens from the small branch through attention. $f\left(·\right)$ and $g\left(·\right)$ are projections to align dimensions. The small branch follows the same procedure but swaps CLS and patch tokens from another branch.

Nlogistic-sigmoid function (NLSIG) is a modern logistic-sigmoid function definition for modelling growth (or decay) processes. It features two logistic metrics (YIR and XIR) for monitoring growth from a two-dimensional (x-y axis) perspective.

NLSIG

Cross-Attention Module

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

**DeepCluster** is a self-supervision approach for learning image representations.  DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update
the weights of the network

Source	CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com