**Multi-head Attention** is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multiple attention heads allows for attending to parts of the sequence differently (e.g. longer-term dependencies versus shorter-term dependencies). 

$$ \text{MultiHead}\left(\textbf{Q}, \textbf{K}, \textbf{V}\right) = \left[\text{head}\_{1},\dots,\text{head}\_{h}\right]\textbf{W}_{0}$$

$$\text{where} \text{ head}\_{i} = \text{Attention} \left(\textbf{Q}\textbf{W}\_{i}^{Q}, \textbf{K}\textbf{W}\_{i}^{K}, \textbf{V}\textbf{W}\_{i}^{V} \right) $$

Above $\textbf{W}$ are all learnable parameter matrices.

Note that [scaled dot-product attention](https://paperswithcode.com/method/scaled) is most commonly used in this module, although in principle it can be swapped out for other types of attention mechanism.

Source: [Lilian Weng](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html#a-family-of-attention-mechanisms)

**DCN-V2** is an architecture for learning-to-rank that improves upon the original [DCN](http://paperswithcode.com/method/dcn) model. It first learns explicit feature interactions of the inputs (typically the embedding layer) through cross layers, and then combines with a deep network to learn complementary implicit interactions. The core of DCN-V2 is the cross layers, which inherit the simple structure of the cross network from DCN, however it is significantly more expressive at learning explicit and bounded-degree cross features.

DCN-V2

DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

Multi-Head Attention

Attention Is All You Need

**RepPoints** is a representation for object detection that consists of a set of points which indicate the spatial extent of an object and semantically significant local areas. This representation is learned via weak localization supervision from rectangular ground-truth boxes and implicit recognition feedback. Based on the richer RepPoints representation, the authors develop an anchor-free object detector that yields improved performance compared to using bounding boxes.

Source	Attention Is All You Need
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com