What is: Affine Operator?

The Affine Operator is an affine transformation layer introduced in the ResMLP architecture. This replaces layer normalization, as in Transformer based networks, which is possible since in the ResMLP, there are no self-attention layers which makes training more stable - hence allowing a more simple affine transformation.

The affine operator is defined as:

$\operatorname{Aff}_{\mathbf{\alpha}, \mathbf{\beta}}(\mathbf{x})=\operatorname{Diag}(\mathbf{\alpha}) \mathbf{x}+\mathbf{\beta}$

where $\alpha$ and $\beta$ are learnable weight vectors. This operation only rescales and shifts the input element-wise. This operation has several advantages over other normalization operations: first, as opposed to Layer Normalization, it has no cost at inference time, since it can absorbed in the adjacent linear layer. Second, as opposed to BatchNorm and Layer Normalization, the Aff operator does not depend on batch statistics.

Source	ResMLP: Feedforward networks for image classification with data-efficient training
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Affine Operator?

Viet-Anh on Software