Viet-Anh on Software Logo

What is: self-DIstillation with NO labels?

SourceEmerging Properties in Self-Supervised Vision Transformers
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

DINO (self-distillation with no labels) is a self-supervised learning method that directly predicts the output of a teacher network - built with a momentum encoder - using a standard cross-entropy loss.

In the example to the right, DINO is illustrated in the case of one single pair of views (x_1,x_2)\left(x\_{1}, x\_{2}\right) for simplicity. The model passes two different random transformations of an input image to the student and teacher networks. Both networks have the same architecture but other parameters. The output of the teacher network is centered with a mean computed over the batch. Each network outputs a KK dimensional feature normalized with a temperature softmax over the feature dimension. Their similarity is then measured with a cross-entropy loss. A stop-gradient (sg) operator is applied to the teacher to propagate gradients only through the student. The teacher parameters are updated with the student parameters' exponential moving average (ema).