Viet-Anh on Software Logo

What is: Filter Response Normalization?

SourceFilter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Filter Response Normalization (FRN) is a type of normalization that combines normalization and an activation function, which can be used as a replacement for other normalizations and activations. It operates on each activation channel of each batch element independently, eliminating the dependency on other batch elements.

To demonstrate, assume we are dealing with the feed-forward convolutional neural network. We follow the usual convention that the filter responses (activation maps) produced after a convolution operation are a 4D tensor XX with shape [B,W,H,C][B, W, H, C], where BB is the mini-batch size, W,HW, H are the spatial extents of the map, and CC is the number of filters used in convolution. CC is also referred to as output channels. Let x=Xb,:,:,cRNx = X_{b,:,:,c} \in \mathcal{R}^{N}, where N=W×HN = W \times H, be the vector of filter responses for the cthc^{th} filter for the bthb^{th} batch point. Let ν2=_ixi2/N\nu^2 = \sum\_i x_i^2/N, be the mean squared norm of xx.

Then Filter Response Normalization is defined as the following:

x^=xν2+ϵ,\hat{x} = \frac{x}{\sqrt{\nu^2 + \epsilon}},

where ϵ\epsilon is a small positive constant to prevent division by zero.

A lack of mean centering in FRN can lead to activations having an arbitrary bias away from zero. Such a bias in conjunction with ReLU can have a detrimental effect on learning and lead to poor performance and dead units. To address this the authors augment ReLU with a learned threshold τ\tau to yield:

z=max(y,τ)z = \max(y, \tau)

Since max(y,τ)=max(yτ,0)+τ=ReLU(yτ)+τ\max(y, \tau){=}\max(y-\tau,0){+}\tau{=}\text{ReLU}{(y{-}\tau)}{+}\tau, the effect of this activation is the same as having a shared bias before and after ReLU.