What is: Demon CM?SourceDemon: Improved Neural Network Training with Momentum DecayYear2000Data SourceCC BY-SA - https://paperswithcode.comDemon CM, or SGD with Momentum and Demon, is the Demon momentum rule applied to SGD with momentum. β_t=β_init⋅(1−tT)(1−β_init)+β_init(1−tT)\beta\_{t} = \beta\_{init}\cdot\frac{\left(1-\frac{t}{T}\right)}{\left(1-\beta\_{init}\right) + \beta\_{init}\left(1-\frac{t}{T}\right)}β_t=β_init⋅(1−β_init)+β_init(1−Tt)(1−Tt) θ_t+1=θ_t−ηg_t+β_tv_t\theta\_{t+1} = \theta\_{t} - \eta{g}\_{t} + \beta\_{t}v\_{t}θ_t+1=θ_t−ηg_t+β_tv_t v_t+1=β_tv_t−ηg_tv\_{t+1} = \beta\_{t}{v\_{t}} - \eta{g\_{t}}v_t+1=β_tv_t−ηg_t