What is: Fixup Initialization?
Source | Fixup Initialization: Residual Learning Without Normalization |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
FixUp Initialization, or Fixed-Update Initialization, is an initialization method that rescales the standard initialization of residual branches by adjusting for the network architecture. Fixup aims to enables training very deep residual networks stably at a maximal learning rate without normalization.
The steps are as follows:
-
Initialize the classification layer and the last layer of each residual branch to 0.
-
Initialize every other layer using a standard method, e.g. Kaiming Initialization, and scale only the weight layers inside residual branches by .
-
Add a scalar multiplier (initialized at 1) in every branch and a scalar bias (initialized at 0) before each convolution, linear, and element-wise activation layer.