Viet-Anh on Software Logo

What is: AMSGrad?

SourceOn the Convergence of Adam and Beyond
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

AMSGrad is a stochastic optimization method that seeks to fix a convergence issue with Adam based optimizers. AMSGrad uses the maximum of past squared gradients v_tv\_{t} rather than the exponential average to update the parameters:

m_t=β_1m_t1+(1β_1)g_tm\_{t} = \beta\_{1}m\_{t-1} + \left(1-\beta\_{1}\right)g\_{t}

v_t=β_2v_t1+(1β_2)g_t2v\_{t} = \beta\_{2}v\_{t-1} + \left(1-\beta\_{2}\right)g\_{t}^{2}

v^_t=max(v^_t1,v_t)\hat{v}\_{t} = \max\left(\hat{v}\_{t-1}, v\_{t}\right)

θ_t+1=θ_tηv^t+ϵm_t\theta\_{t+1} = \theta\_{t} - \frac{\eta}{\sqrt{\hat{v}_{t}} + \epsilon}m\_{t}