AICurious Logo

What is: XCiT Layer?

SourceXCiT: Cross-Covariance Image Transformers
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

An XCiT Layer is the main building block of the XCiT architecture which uses a cross-covariance attention operator as its principal operation. The XCiT layer consists of three main blocks, each preceded by LayerNorm and followed by a residual connection: (i) the core cross-covariance attention (XCA) operation, (ii) the local patch interaction (LPI) module, and (iii) a feed-forward network (FFN). By transposing the query-key interaction, the computational complexity of XCA is linear in the number of data elements N, rather than quadratic as in conventional self-attention.