What is: Natural Gradient Descent?
Year | 1998 |
Data Source | CC BY-SA - https://paperswithcode.com |
Natural Gradient Descent is an approximate second-order optimisation method. It has an interpretation as optimizing over a Riemannian manifold using an intrinsic distance metric, which implies the updates are invariant to transformations such as whitening. By using the positive semi-definite (PSD) Gauss-Newton matrix to approximate the (possibly negative definite) Hessian, NGD can often work better than exact second-order methods.
Given the gradient of , , NGD computes the update as:
where the Fisher information matrix is defined as:
The log-likelihood function typically corresponds to commonly used error functions such as the cross entropy loss.
Source: LOGAN
Image: Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks