**Additive Attention**, also known as **Bahdanau Attention**, uses a one-hidden layer feed-forward network to calculate the attention alignment score:

$$f_{att}\left(\textbf{h}_{i}, \textbf{s}\_{j}\right) = v\_{a}^{T}\tanh\left(\textbf{W}\_{a}\left[\textbf{h}\_{i};\textbf{s}\_{j}\right]\right)$$

where $\textbf{v}\_{a}$ and $\textbf{W}\_{a}$ are learned attention parameters. Here $\textbf{h}$ refers to the hidden states for the encoder, and $\textbf{s}$ is the hidden states for the decoder. The function above is thus a type of alignment score function. We can use a matrix of alignment scores to show the correlation between source and target words, as the Figure to the right shows.

Within a neural network, once we have the alignment scores, we calculate the final scores using a [softmax](https://paperswithcode.com/method/softmax) function of these alignment scores (ensuring it sums to 1).

**Teacher-Tutor-Student Knowledge Distillation** is a method for image virtual try-on models. It treats fake images produced by the parser-based method as "tutor knowledge", where the artifacts can be corrected by real "teacher knowledge", which is extracted from the real person images in a self-supervised way. Other than using real images as supervisions, knowledge distillation is formulated in the try-on problem as distilling the appearance flows between the person image and the garment image, enabling the finding of dense correspondences between them to produce high-quality results.

Teacher-Tutor-Student Knowledge Distillation

Parser-Free Virtual Try-on via Distilling Appearance Flows

Additive Attention

Neural Machine Translation by Jointly Learning to Align and Translate

AugMix mixes augmented images through linear interpolations. Consequently it is like [Mixup](https://paperswithcode.com/method/mixup) but instead mixes augmented versions of the same image.

Source	Neural Machine Translation by Jointly Learning to Align and Translate
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com