Viet-Anh on Software Logo

What is: Demon ADAM?

SourceDemon: Improved Neural Network Training with Momentum Decay
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Demon Adam is a stochastic optimizer where the Demon momentum rule is applied to the Adam optimizer.

β_t=β_init(1tT)(1β_init)+β_init(1tT)\beta\_{t} = \beta\_{init}\cdot\frac{\left(1-\frac{t}{T}\right)}{\left(1-\beta\_{init}\right) + \beta\_{init}\left(1-\frac{t}{T}\right)}

m_t,i=g_t,i+β_tm_t1,im\_{t, i} = g\_{t, i} + \beta\_{t}m\_{t-1, i}

v_t+1=β_2v_t+(1β_2)g2_tv\_{t+1} = \beta\_{2}v\_{t} + \left(1-\beta\_{2}\right)g^{2}\_{t}

θt=θt1ηm^_tv^_t+ϵ\theta_{t} = \theta_{t-1} - \eta\frac{\hat{m}\_{t}}{\sqrt{\hat{v}\_{t}} + \epsilon}