What is: Multi-Head Linear Attention?

Multi-Head Linear Attention is a type of linear multi-head self-attention module, proposed with the Linformer architecture. The main idea is to add two linear projection matrices $E\_{i}, F\_{i} \in \mathbb{R}^{n\times{k}}$ when computing key and value. We first project the original $\left(n \times d\right)$ -dimensional key and value layers $KW\_{i}^{K}$ and $VW\_{i}^{V}$ into $\left(k\times{d}\right)$ -dimensional projected key and value layers. We then compute a $\left(n\times{k}\right)$ dimensional context mapping $\bar{P}$ using scaled-dot product attention:

$\bar{\text{head}\_{i}} = \text{Attention}\left(QW^{Q}\_{i}, E\_{i}KW\_{i}^{K}, F\_{i}VW\_{i}^{V}\right)$

$\bar{\text{head}\_{i}} = \text{softmax}\left(\frac{QW^{Q}\_{i}\left(E\_{i}KW\_{i}^{K}\right)^{T}}{\sqrt{d\_{k}}}\right) \cdot F\_{i}VW\_{i}^{V}$

Finally, we compute context embeddings for each head using $\bar{P} \cdot \left(F\_{i}{V}W\_{i}^{V}\right)$ .

Source	Linformer: Self-Attention with Linear Complexity
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com