Viet-Anh on Software Logo

What is: Contrastive Predictive Coding?

SourceRepresentation Learning with Contrastive Predictive Coding
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Contrastive Predictive Coding (CPC) learns self-supervised representations by predicting the future in latent space by using powerful autoregressive models. The model uses a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples.

First, a non-linear encoder g_encg\_{enc} maps the input sequence of observations x_tx\_{t} to a sequence of latent representations z_t=g_enc(x_t)z\_{t} = g\_{enc}\left(x\_{t}\right), potentially with a lower temporal resolution. Next, an autoregressive model g_arg\_{ar} summarizes all ztz\leq{t} in the latent space and produces a context latent representation c_t=g_ar(zt)c\_{t} = g\_{ar}\left(z\leq{t}\right).

A density ratio is modelled which preserves the mutual information between x_t+kx\_{t+k} and c_tc\_{t} as follows:

f_k(x_t+k,c_t)p(x_t+kc_t)p(x_t+k)f\_{k}\left(x\_{t+k}, c\_{t}\right) \propto \frac{p\left(x\_{t+k}|c\_{t}\right)}{p\left(x\_{t+k}\right)}

where \propto stands for ’proportional to’ (i.e. up to a multiplicative constant). Note that the density ratio ff can be unnormalized (does not have to integrate to 1). The authors use a simple log-bilinear model:

f_k(x_t+k,c_t)=exp(zT_t+kW_kc_t)f\_{k}\left(x\_{t+k}, c\_{t}\right) = \exp\left(z^{T}\_{t+k}W\_{k}c\_{t}\right)

Any type of autoencoder and autoregressive can be used. An example the authors opt for is strided convolutional layers with residual blocks and GRUs.

The autoencoder and autoregressive models are trained to minimize an InfoNCE loss (see components).