What is: Residual Multi-Layer Perceptrons?
Source | ResMLP: Feedforward networks for image classification with data-efficient training |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Residual Multi-Layer Perceptrons, or ResMLP, is an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. At the end of the network, the patch representations are average pooled, and fed to a linear classifier.
Layer normalization is replaced with a simpler affine transformation, thanks to the absence of self-attention layers which makes training more stable. The affine operator is applied at the beginning ("pre-normalization") and end ("post-normalization") of each residual block. As a pre-normalization, Aff replaces LayerNorm without using channel-wise statistics. Initialization is achieved as , and . As a post-normalization, Aff is similar to LayerScale and is initialized with the same small value.