Viet-Anh on Software Logo

What is: Activation Regularization?

SourceRevisiting Activation Regularization for Language RNNs
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Activation Regularization (AR), or L_2L\_{2} activation regularization, is regularization performed on activations as opposed to weights. It is usually used in conjunction with RNNs. It is defined as:

αL_2(mh_t)\alpha{L}\_{2}\left(m\circ{h\_{t}}\right)

where mm is a dropout mask used by later parts of the model, L_2L\_{2} is the L_2L\_{2} norm, and hth_{t} is the output of an RNN at timestep tt, and α\alpha is a scaling coefficient.

When applied to the output of a dense layer, AR penalizes activations that are substantially away from 0, encouraging activations to remain small.