Viet-Anh on Software Logo

What is: Cosine Annealing?

SourceSGDR: Stochastic Gradient Descent with Warm Restarts
Data SourceCC BY-SA -

Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and the re-use of good weights as the starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.

Where where η_mini\eta\_{min}^{i} and η_maxi \eta\_{max}^{i} are ranges for the learning rate, and T_curT\_{cur} account for how many epochs have been performed since the last restart.

Text Source: Jason Brownlee

Image Source: Gao Huang