What is: LAMB?
Source | Large Batch Optimization for Deep Learning: Training BERT in 76 minutes |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
LAMB is a a layerwise adaptive large batch optimization technique. It provides a strategy for adapting the learning rate in large batch settings. LAMB uses Adam as the base algorithm and then forms an update as:
Unlike LARS, the adaptivity of LAMB is two-fold: (i) per dimension normalization with respect to the square root of the second moment used in Adam and (ii) layerwise normalization obtained due to layerwise adaptivity.