What is: SGDW?SourceDecoupled Weight Decay RegularizationYear2000Data SourceCC BY-SA - https://paperswithcode.comSGDW is a stochastic optimization technique that decouples weight decay from the gradient update: g_t=∇f_t(θ_t−1)+λθ_t−1 g\_{t} = \nabla{f\_{t}}\left(\theta\_{t-1}\right) + \lambda\theta\_{t-1}g_t=∇f_t(θ_t−1)+λθ_t−1 m_t=β_1m_t−1+η_tαg_t m\_{t} = \beta\_{1}m\_{t-1} + \eta\_{t}\alpha{g}\_{t}m_t=β_1m_t−1+η_tαg_t θ_t=θ_t−1−m_t−η_tλθ_t−1 \theta\_{t} = \theta\_{t-1} - m\_{t} - \eta\_{t}\lambda\theta\_{t-1}θ_t=θ_t−1−m_t−η_tλθ_t−1