What is: Cross-Covariance Attention?
Source | XCiT: Cross-Covariance Image Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Cross-Covariance Attention, or XCA, is an attention mechanism which operates along the feature dimension instead of the token dimension as in conventional transformers.
Using the definitions of queries, keys and values from conventional attention, the cross-covariance attention function is defined as:
where each output token embedding is a convex combination of the features of its corresponding token embedding in . The attention weights are computed based on the cross-covariance matrix.