What is: Adam?
| Source | Adam: A Method for Stochastic Optimization | 
| Year | 2000 | 
| Data Source | CC BY-SA - https://paperswithcode.com | 
Adam is an adaptive learning rate optimization algorithm that utilises both momentum and scaling, combining the benefits of RMSProp and SGD w/th Momentum. The optimizer is designed to be appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
The weight updates are performed as:
with
is the step size/learning rate, around 1e-3 in the original paper. is a small number, typically 1e-8 or 1e-10, to prevent dividing by zero. and are forgetting parameters, with typical values 0.9 and 0.999, respectively.
