**MoCo v3** aims to stabilize training of self-supervised ViTs. MoCo v3 is an incremental improvement of MoCo v1/2. Two crops are used for each image under random data augmentation. They are encoded by two encoders $f_q$ and $f_k$ with output vectors $q$ and $k$. $q$ behaves like a "query", where the goal of learning is to retrieve the corresponding "key". The objective is to minimize a contrastive loss function of the following form: 

$$
\mathcal{L_q}=-\log \frac{\exp \left(q \cdot k^{+} / \tau\right)}{\exp \left(q \cdot k^{+} / \tau\right)+\sum_{k^{-}} \exp \left(q \cdot k^{-} / \tau\right)}
$$

This approach aims to train the Transformer in the contrastive/Siamese paradigm. The encoder $f_q$ consists of a backbone (e.g., ResNet and ViT), a projection head, and an extra prediction head. The encoder $f_k$ has the back the backbone and projection head but not the prediction head. $f_k$ is updated by the moving average of $f_q$, excluding the prediction head.

A **Global Context Network**, or **GCNet**, utilises global context blocks to model long-range dependencies in images. It is based on the [Non-Local Network](https://paperswithcode.com/method/non-local-block), but it modifies the architecture so less computation is required. Global context blocks are applied to multiple layers in a backbone network to construct the GCNet.

GCNet

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

MoCo v3

An Empirical Study of Training Self-Supervised Vision Transformers

**Poincaré Embeddings** learn hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an $n$-dimensional Poincaré ball. Due to the underlying hyperbolic geometry, this allows for learning of parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. Embeddings are learnt based on
Riemannian optimization.

Source	An Empirical Study of Training Self-Supervised Vision Transformers
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com