What is: Nesterov Accelerated Gradient?
Year | 1983 |
Data Source | CC BY-SA - https://paperswithcode.com |
Nesterov Accelerated Gradient is a momentum-based SGD optimizer that "looks ahead" to where the parameters will be to calculate the gradient ex post rather than ex ante:
Like SGD with momentum is usually set to .
The intuition is that the standard momentum method first computes the gradient at the current location and then takes a big jump in the direction of the updated accumulated gradient. In contrast Nesterov momentum first makes a big jump in the direction of the previous accumulated gradient and then measures the gradient where it ends up and makes a correction. The idea being that it is better to correct a mistake after you have made it.
Image Source: Geoff Hinton lecture notes