Viet-Anh on Software Logo

What is: Group Normalization?

SourceGroup Normalization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Group Normalization is a normalization layer that divides channels into groups and normalizes the features within each group. GN does not exploit the batch dimension, and its computation is independent of batch sizes. In the case where the group size is 1, it is equivalent to Instance Normalization.

As motivation for the method, many classical features like SIFT and HOG had group-wise features and involved group-wise normalization. For example, a HOG vector is the outcome of several spatial cells where each cell is represented by a normalized orientation histogram.

Formally, Group Normalization is defined as:

μ_i=1m_kS_ix_k\mu\_{i} = \frac{1}{m}\sum\_{k\in\mathcal{S}\_{i}}x\_{k}

σ2_i=1m_kS_i(x_kμ_i)2\sigma^{2}\_{i} = \frac{1}{m}\sum\_{k\in\mathcal{S}\_{i}}\left(x\_{k}-\mu\_{i}\right)^{2}

x^_i=x_iμ_iσ2_i+ϵ\hat{x}\_{i} = \frac{x\_{i} - \mu\_{i}}{\sqrt{\sigma^{2}\_{i}+\epsilon}}

Here xx is the feature computed by a layer, and ii is an index. Formally, a Group Norm layer computes μ\mu and σ\sigma in a set S_i\mathcal{S}\_{i} defined as: S_i=\mathcal{S}\_{i} = {kk_N=i_N,k_CC/G=I_CC/Gk \mid k\_{N} = i\_{N} ,\lfloor\frac{k\_{C}}{C/G}\rfloor = \lfloor\frac{I\_{C}}{C/G}\rfloor }.

Here GG is the number of groups, which is a pre-defined hyper-parameter (G=32G = 32 by default). C/GC/G is the number of channels per group. \lfloor is the floor operation, and the final term means that the indexes ii and kk are in the same group of channels, assuming each group of channels are stored in a sequential order along the CC axis.