Viet-Anh on Software Logo

What is: Fixup Initialization?

SourceFixup Initialization: Residual Learning Without Normalization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

FixUp Initialization, or Fixed-Update Initialization, is an initialization method that rescales the standard initialization of residual branches by adjusting for the network architecture. Fixup aims to enables training very deep residual networks stably at a maximal learning rate without normalization.

The steps are as follows:

  1. Initialize the classification layer and the last layer of each residual branch to 0.

  2. Initialize every other layer using a standard method, e.g. Kaiming Initialization, and scale only the weight layers inside residual branches by L12m2L^{\frac{1}{2m-2}}.

  3. Add a scalar multiplier (initialized at 1) in every branch and a scalar bias (initialized at 0) before each convolution, linear, and element-wise activation layer.