What is: Activation Regularization?
Source | Revisiting Activation Regularization for Language RNNs |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Activation Regularization (AR), or activation regularization, is regularization performed on activations as opposed to weights. It is usually used in conjunction with RNNs. It is defined as:
where is a dropout mask used by later parts of the model, is the norm, and is the output of an RNN at timestep , and is a scaling coefficient.
When applied to the output of a dense layer, AR penalizes activations that are substantially away from 0, encouraging activations to remain small.