Viet-Anh on Software Logo

What is: Entropy Regularization?

SourceAsynchronous Methods for Deep Reinforcement Learning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Entropy Regularization is a type of regularization used in reinforcement learning. For on-policy policy gradient based methods like A3C, the same mutual reinforcement behaviour leads to a highly-peaked π(as)\pi\left(a\mid{s}\right) towards a few actions or action sequences, since it is easier for the actor and critic to overoptimise to a small portion of the environment. To reduce this problem, entropy regularization adds an entropy term to the loss to promote action diversity:

H(X)=π(x)log(π(x))H(X) = -\sum\pi\left(x\right)\log\left(\pi\left(x\right)\right)

Image Credit: Wikipedia