What is: Cross-Covariance Attention?

Cross-Covariance Attention, or XCA, is an attention mechanism which operates along the feature dimension instead of the token dimension as in conventional transformers.

Using the definitions of queries, keys and values from conventional attention, the cross-covariance attention function is defined as:

\text { XC-Attention }(Q, K, V)=V \mathcal{A}_{\mathrm{XC}}(K, Q), \quad \mathcal{A}\_{\mathrm{XC}}(K, Q)=\operatorname{Softmax}\left(\hat{K}^{\top} \hat{Q} / \tau\right)

where each output token embedding is a convex combination of the $d\_{v}$ features of its corresponding token embedding in $V$ . The attention weights $\mathcal{A}$ are computed based on the cross-covariance matrix.

Source	XCiT: Cross-Covariance Image Transformers
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Cross-Covariance Attention?

Viet-Anh on Software