What is: Adam?
Source | Adam: A Method for Stochastic Optimization |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Adam is an adaptive learning rate optimization algorithm that utilises both momentum and scaling, combining the benefits of RMSProp and SGD w/th Momentum. The optimizer is designed to be appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
The weight updates are performed as:
with
is the step size/learning rate, around 1e-3 in the original paper. is a small number, typically 1e-8 or 1e-10, to prevent dividing by zero. and are forgetting parameters, with typical values 0.9 and 0.999, respectively.