What is: LAMB?

LAMB is a a layerwise adaptive large batch optimization technique. It provides a strategy for adapting the learning rate in large batch settings. LAMB uses Adam as the base algorithm and then forms an update as:

$r\_{t} = \frac{m\_{t}}{\sqrt{v\_{t}} + \epsilon}$ $x\_{t+1}^{\left(i\right)} = x\_{t}^{\left(i\right)} - \eta\_{t}\frac{\phi\left(|| x\_{t}^{\left(i\right)} ||\right)}{|| m\_{t}^{\left(i\right)} || }\left(r\_{t}^{\left(i\right)}+\lambda{x\_{t}^{\left(i\right)}}\right)$

Unlike LARS, the adaptivity of LAMB is two-fold: (i) per dimension normalization with respect to the square root of the second moment used in Adam and (ii) layerwise normalization obtained due to layerwise adaptivity.

Source	Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: LAMB?

Viet-Anh on Software