What is: Contrastive Predictive Coding?

Contrastive Predictive Coding (CPC) learns self-supervised representations by predicting the future in latent space by using powerful autoregressive models. The model uses a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples.

First, a non-linear encoder $g\_{enc}$ maps the input sequence of observations $x\_{t}$ to a sequence of latent representations $z\_{t} = g\_{enc}\left(x\_{t}\right)$ , potentially with a lower temporal resolution. Next, an autoregressive model $g\_{ar}$ summarizes all $z\leq{t}$ in the latent space and produces a context latent representation $c\_{t} = g\_{ar}\left(z\leq{t}\right)$ .

A density ratio is modelled which preserves the mutual information between $x\_{t+k}$ and $c\_{t}$ as follows:

$f\_{k}\left(x\_{t+k}, c\_{t}\right) \propto \frac{p\left(x\_{t+k}|c\_{t}\right)}{p\left(x\_{t+k}\right)}$

where $\propto$ stands for ’proportional to’ (i.e. up to a multiplicative constant). Note that the density ratio $f$ can be unnormalized (does not have to integrate to 1). The authors use a simple log-bilinear model:

$f\_{k}\left(x\_{t+k}, c\_{t}\right) = \exp\left(z^{T}\_{t+k}W\_{k}c\_{t}\right)$

Any type of autoencoder and autoregressive can be used. An example the authors opt for is strided convolutional layers with residual blocks and GRUs.

The autoencoder and autoregressive models are trained to minimize an InfoNCE loss (see components).

Source	Representation Learning with Contrastive Predictive Coding
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com