What is: XCiT Layer?
Source | XCiT: Cross-Covariance Image Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
An XCiT Layer is the main building block of the XCiT architecture which uses a cross-covariance attention operator as its principal operation. The XCiT layer consists of three main blocks, each preceded by LayerNorm and followed by a residual connection: (i) the core cross-covariance attention (XCA) operation, (ii) the local patch interaction (LPI) module, and (iii) a feed-forward network (FFN). By transposing the query-key interaction, the computational complexity of XCA is linear in the number of data elements N, rather than quadratic as in conventional self-attention.