Viet-Anh on Software Logo

What is: Momentumized, adaptive, dual averaged gradient?

SourceAdaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The MADGRAD method contains a series of modifications to the AdaGrad-DA method to improve its performance on deep learning optimization problems. It gives state-of-the-art generalization performance across a diverse set of problems, including those that Adam normally under-performs on.