ReLIC, or Representation Learning via Invariant Causal Mechanisms, is a self-supervised learning objective that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees.
We can write the objective as:
\underset{X}{\mathbb{E}} \underset{\sim\_{l k}, a\_{q \mathcal{A}}}{\mathbb{E}} \sum_{b \in\left\(a\_{l k}, a\_{q t}\right\)} \mathcal{L}\_{b}\left(Y^{R}, f(X)\right) \text { s.t. } K L\left(p^{d o\left(a\_{l k}\right)}\left(Y^{R} \mid f(X)\right), p^{d o\left(a\_{q t}\right)}\left(Y^{R} \mid f(X)\right)\right) \leq \rho
where L is the proxy task loss and KL is the Kullback-Leibler (KL) divergence. Note that any distance measure on distributions can be used in place of the KL divergence.
Concretely, as proxy task we associate to every datapoint x_i the label y_iR=i. This corresponds to the instance discrimination task, commonly used in contrastive learning. We take pairs of points (x_i,x_j) to compute similarity scores and use pairs of augmentations a_lk=(a_l,a_k)∈ A×A to perform a style intervention. Given a batch of samples \left\(x\_{i}\right\)\_{i=1}^{N} \sim \mathcal{D}, we use
pdo(a_lk)(YR=j∣f(x_i))∝exp(ϕ(f(x_ia_l),h(x_ja_k))/τ)
with xa data augmented with a and τ a softmax temperature parameter. We encode f using a neural network and choose h to be related to f, e.g. h=f or as a network with an exponential moving average of the weights of f (e.g. target networks). To compare representations we use the function ϕ(f(x_i),h(x_j))=⟨g(f(x_i)),g(h(x_j))⟩ where g is a fully-connected neural network often called the critic.
Combining these pieces, we learn representations by minimizing the following objective over the full set of data x_i∈D and augmentations alk∈A×A
−i=1∑N∑_a_lklog∑_m=1Mexp(ϕ(f(x_ia_l),h(x_ma_k))/τ)exp(ϕ(f(x_ial),h(x_ia_k))/τ)+α∑_a_lk,a_qtKL(pdo(a_lk),pdo(a_qt))
with M the number of points we use to construct the contrast set and α the weighting of the invariance penalty. The shorthand pdo(a) is used for pdo(a)(YR=j∣f(x_i)). The Figure shows a schematic of the RELIC objective.