What is: Embedded Gaussian Affinity?

Embedded Gaussian Affinity is a type of affinity or self-similarity function between two points $\mathbf{x\_{i}}$ and $\mathbf{x\_{j}}$ that uses a Gaussian function in an embedding space:

$f\left(\mathbf{x\_{i}}, \mathbf{x\_{j}}\right) = e^{\theta\left(\mathbf{x\_{i}}\right)^{T}\phi\left(\mathbf{x\_{j}}\right)}$

Here $\theta\left(x\_{i}\right) = W\_{θ}x\_{i}$ and $\phi\left(x\_{j}\right) = W\_{φ}x\_{j}$ are two embeddings.

Note that the self-attention module used in the original Transformer model is a special case of non-local operations in the embedded Gaussian version. This can be seen from the fact that for a given $i$ , $\frac{1}{\mathcal{C}\left(\mathbf{x}\right)}\sum\_{\forall{j}}f\left(\mathbf{x}\_{i}, \mathbf{x}\_{j}\right)g\left(\mathbf{x}\_{j}\right)$ becomes the softmax computation along the dimension $j$ . So we have $\mathbf{y} = \text{softmax}\left(\mathbf{x}^{T}W^{T}\_{\theta}W\_{\phi}\mathbf{x}\right)g\left(\mathbf{x}\right)$ , which is the self-attention form in the Transformer model. This shows how we can relate this recent self-attention model to the classic computer vision method of non-local means.

Source	Non-local Neural Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com