**Cross-Covariance Attention**, or **XCA**, is an [attention mechanism](https://paperswithcode.com/methods/category/attention-mechanisms-1) which operates along the feature dimension instead of the token dimension as in [conventional transformers](https://paperswithcode.com/methods/category/transformers).

Using the definitions of queries, keys and values from conventional attention, the cross-covariance attention function is defined as:

$$
\text { XC-Attention }(Q, K, V)=V \mathcal{A}_{\mathrm{XC}}(K, Q), \quad \mathcal{A}\_{\mathrm{XC}}(K, Q)=\operatorname{Softmax}\left(\hat{K}^{\top} \hat{Q} / \tau\right)
$$

where each output token embedding is a convex combination of the $d\_{v}$ features of its corresponding token embedding in $V$. The attention weights $\mathcal{A}$ are computed based on the cross-covariance matrix.

TAPEX is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesising executable SQL queries.

TAPEX

TAPEX: Table Pre-training via Learning a Neural SQL Executor

Cross-Covariance Attention

XCiT: Cross-Covariance Image Transformers

**TabTransformer** is a deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. 

As an overview, the architecture comprises a column embedding layer, a stack of $N$ [Transformer](/method/transformer) layers, and a multi-layer perceptron (MLP). The contextual embeddings (outputted by the Transformer layer) are concatenated along with continuous features which is inputted to an MLP. The loss function is then minimized  to learn all the parameters in an end-to-end learning.

Source	XCiT: Cross-Covariance Image Transformers
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com