What is: Demon?
Source | Demon: Improved Neural Network Training with Momentum Decay |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Decaying Momentum, or Demon, is a stochastic optimizer motivated by decaying the total contribution of a gradient to all future updates. By decaying the momentum parameter, the total contribution of a gradient to all future updates is decayed. A particular gradient term contributes a total of of its "energy" to all future gradient updates, and this results in the geometric sum, . Decaying this sum results in the Demon algorithm. Letting be the initial ; then at the current step with total steps, the decay routine is given by solving the below for :
Where refers to the proportion of iterations remaining. Note that Demon typically requires no hyperparameter tuning as it is usually decayed to or a small negative value at time . Improved performance is observed by delaying the decaying. Demon can be applied to any gradient descent algorithm with a momentum parameter.