Viet-Anh on Software Logo

What is: Nesterov Accelerated Gradient?

Year1983
Data SourceCC BY-SA - https://paperswithcode.com

Nesterov Accelerated Gradient is a momentum-based SGD optimizer that "looks ahead" to where the parameters will be to calculate the gradient ex post rather than ex ante:

v_t=γv_t1+η_θJ(θγv_t1)v\_{t} = \gamma{v}\_{t-1} + \eta\nabla\_{\theta}J\left(\theta-\gamma{v\_{t-1}}\right) θ_t=θ_t1+v_t\theta\_{t} = \theta\_{t-1} + v\_{t}

Like SGD with momentum γ\gamma is usually set to 0.90.9.

The intuition is that the standard momentum method first computes the gradient at the current location and then takes a big jump in the direction of the updated accumulated gradient. In contrast Nesterov momentum first makes a big jump in the direction of the previous accumulated gradient and then measures the gradient where it ends up and makes a correction. The idea being that it is better to correct a mistake after you have made it.

Image Source: Geoff Hinton lecture notes