What is: Natural Gradient Descent?

Natural Gradient Descent is an approximate second-order optimisation method. It has an interpretation as optimizing over a Riemannian manifold using an intrinsic distance metric, which implies the updates are invariant to transformations such as whitening. By using the positive semi-definite (PSD) Gauss-Newton matrix to approximate the (possibly negative definite) Hessian, NGD can often work better than exact second-order methods.

Given the gradient of $z$ , $g = \frac{\delta{f}\left(z\right)}{\delta{z}}$ , NGD computes the update as:

$\Delta{z} = \alpha{F}^{−1}g$

where the Fisher information matrix $F$ is defined as:

$F = \mathbb{E}\_{p\left(t\mid{z}\right)}\left[\nabla\ln{p}\left(t\mid{z}\right)\nabla\ln{p}\left(t\mid{z}\right)^{T}\right]$

The log-likelihood function $\ln{p}\left(t\mid{z}\right)$ typically corresponds to commonly used error functions such as the cross entropy loss.

Source: LOGAN

Image: Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Year	1998
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Natural Gradient Descent?

Viet-Anh on Software