Viet-Anh on Software Logo

What is: Natural Gradient Descent?

Year1998
Data SourceCC BY-SA - https://paperswithcode.com

Natural Gradient Descent is an approximate second-order optimisation method. It has an interpretation as optimizing over a Riemannian manifold using an intrinsic distance metric, which implies the updates are invariant to transformations such as whitening. By using the positive semi-definite (PSD) Gauss-Newton matrix to approximate the (possibly negative definite) Hessian, NGD can often work better than exact second-order methods.

Given the gradient of zz, g=δf(z)δzg = \frac{\delta{f}\left(z\right)}{\delta{z}}, NGD computes the update as:

Δz=αF1g\Delta{z} = \alpha{F}^{−1}g

where the Fisher information matrix FF is defined as:

F=E_p(tz)[lnp(tz)lnp(tz)T]F = \mathbb{E}\_{p\left(t\mid{z}\right)}\left[\nabla\ln{p}\left(t\mid{z}\right)\nabla\ln{p}\left(t\mid{z}\right)^{T}\right]

The log-likelihood function lnp(tz)\ln{p}\left(t\mid{z}\right) typically corresponds to commonly used error functions such as the cross entropy loss.

Source: LOGAN

Image: Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks