What is: Cross-Attention Module?
Source | CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
The Cross-Attention module is an attention module used in CrossViT for fusion of multi-scale features. The CLS token of the large branch (circle) serves as a query token to interact with the patch tokens from the small branch through attention. and are projections to align dimensions. The small branch follows the same procedure but swaps CLS and patch tokens from another branch.