What is: ReLIC?

ReLIC, or Representation Learning via Invariant Causal Mechanisms, is a self-supervised learning objective that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees.

We can write the objective as:

\underset{X}{\mathbb{E}} \underset{\sim\_{l k}, a\_{q \mathcal{A}}}{\mathbb{E}} \sum_{b \in\left\(a\_{l k}, a\_{q t}\right\)} \mathcal{L}\_{b}\left(Y^{R}, f(X)\right) \text { s.t. } K L\left(p^{d o\left(a\_{l k}\right)}\left(Y^{R} \mid f(X)\right), p^{d o\left(a\_{q t}\right)}\left(Y^{R} \mid f(X)\right)\right) \leq \rho

where $\mathcal{L}$ is the proxy task loss and $K L$ is the Kullback-Leibler (KL) divergence. Note that any distance measure on distributions can be used in place of the KL divergence.

Concretely, as proxy task we associate to every datapoint $x\_{i}$ the label $y\_{i}^{R}=i$ . This corresponds to the instance discrimination task, commonly used in contrastive learning. We take pairs of points $\left(x\_{i}, x\_{j}\right)$ to compute similarity scores and use pairs of augmentations $a\_{l k}=\left(a\_{l}, a\_{k}\right) \in$ $\mathcal{A} \times \mathcal{A}$ to perform a style intervention. Given a batch of samples $\left$x\_{i}\right$\_{i=1}^{N} \sim \mathcal{D}$ , we use

p^{d o\left(a\_{l k}\right)}\left(Y^{R}=j \mid f\left(x\_{i}\right)\right) \propto \exp \left(\phi\left(f\left(x\_{i}^{a\_{l}}\right), h\left(x\_{j}^{a\_{k}}\right)\right) / \tau\right)

with $x^{a}$ data augmented with $a$ and $\tau$ a softmax temperature parameter. We encode $f$ using a neural network and choose $h$ to be related to $f$ , e.g. $h=f$ or as a network with an exponential moving average of the weights of $f$ (e.g. target networks). To compare representations we use the function $\phi\left(f\left(x\_{i}\right), h\left(x\_{j}\right)\right)=\left\langle g\left(f\left(x\_{i}\right)\right), g\left(h\left(x\_{j}\right)\right)\right\rangle$ where $g$ is a fully-connected neural network often called the critic.

Combining these pieces, we learn representations by minimizing the following objective over the full set of data $x\_{i} \in \mathcal{D}$ and augmentations $a_{l k} \in \mathcal{A} \times \mathcal{A}$

-\sum_{i=1}^{N} \sum\_{a\_{l k}} \log \frac{\exp \left(\phi\left(f\left(x\_{i}^{a_{l}}\right), h\left(x\_{i}^{a\_{k}}\right)\right) / \tau\right)}{\sum\_{m=1}^{M} \exp \left(\phi\left(f\left(x\_{i}^{a\_{l}}\right), h\left(x\_{m}^{a\_{k}}\right)\right) / \tau\right)}+\alpha \sum\_{a\_{l k}, a\_{q t}} K L\left(p^{d o\left(a\_{l k}\right)}, p^{d o\left(a\_{q t}\right)}\right)

with $M$ the number of points we use to construct the contrast set and $\alpha$ the weighting of the invariance penalty. The shorthand $p^{d o(a)}$ is used for $p^{d o(a)}\left(Y^{R}=j \mid f\left(x\_{i}\right)\right)$ . The Figure shows a schematic of the RELIC objective.

Source	Representation Learning via Invariant Causal Mechanisms
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com