What is: ReZero?
Source | ReZero is All You Need: Fast Convergence at Large Depth |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
ReZero is a normalization approach that dynamically facilitates well-behaved gradients and arbitrarily deep signal propagation. The idea is simple: ReZero initializes each layer to perform the identity operation. For each layer, a residual connection is introduced for the input signal and one trainable parameter that modulates the non-trivial transformation of a layer :
where at the beginning of training. Initially the gradients for all parameters defining vanish, but dynamically evolve to suitable values during initial stages of training. The architecture is illustrated in the Figure.