Viet-Anh on Software Logo

What is: Online Normalization?

SourceOnline Normalization for Training Neural Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Online Normalization is a normalization technique for training deep neural networks. To define Online Normalization. we replace arithmetic averages over the full dataset in with exponentially decaying averages of online samples. The decay factors α_f\alpha\_{f} and α_b\alpha\_{b} for forward and backward passes respectively are hyperparameters for the technique.

We allow incoming samples x_tx\_{t}, such as images, to have multiple scalar components and denote feature-wide mean and variance by μ(x_t)\mu\left(x\_{t}\right) and σ2(x_t)\sigma^{2}\left(x\_{t}\right). The algorithm also applies to outputs of fully connected layers with only one scalar output per feature. In fact, this case simplifies to μ(x_t)=x_t\mu\left(x\_{t}\right) = x\_{t} and σ(x_t)=0\sigma\left(x\_{t}\right) = 0. Denote scalars μ_t\mu\_{t} and σ_t\sigma\_{t} to denote running estimates of mean and variance across all samples. The subscript tt denotes time steps corresponding to processing new incoming samples.

Online Normalization uses an ongoing process during the forward pass to estimate activation means and variances. It implements the standard online computation of mean and variance generalized to processing multi-value samples and exponential averaging of sample statistics. The resulting estimates directly lead to an affine normalization transform.

y_t=x_tμ_t1σ_t1y\_{t} = \frac{x\_{t} - \mu\_{t-1}}{\sigma\_{t-1}}

μ_t=α_fμ_t1+(1α_f)μ(x_t)\mu\_{t} = \alpha\_{f}\mu\_{t-1} + \left(1-\alpha\_{f}\right)\mu\left(x\_{t}\right)

σ2_t=α_fσ2_t1+(1α_f)σ2(x_t)+α_f(1α_f)(μ(x_t)μ_t1)2\sigma^{2}\_{t} = \alpha\_{f}\sigma^{2}\_{t-1} + \left(1-\alpha\_{f}\right)\sigma^{2}\left(x\_{t}\right) + \alpha\_{f}\left(1-\alpha\_{f}\right)\left(\mu\left(x\_{t}\right) - \mu\_{t-1}\right)^{2}