What is: Affine Operator?
Source | ResMLP: Feedforward networks for image classification with data-efficient training |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
The Affine Operator is an affine transformation layer introduced in the ResMLP architecture. This replaces layer normalization, as in Transformer based networks, which is possible since in the ResMLP, there are no self-attention layers which makes training more stable - hence allowing a more simple affine transformation.
The affine operator is defined as:
where and are learnable weight vectors. This operation only rescales and shifts the input element-wise. This operation has several advantages over other normalization operations: first, as opposed to Layer Normalization, it has no cost at inference time, since it can absorbed in the adjacent linear layer. Second, as opposed to BatchNorm and Layer Normalization, the Aff operator does not depend on batch statistics.