What is: AdamW?
Source | Decoupled Weight Decay Regularization |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
AdamW is a stochastic optimization method that modifies the typical implementation of weight decay in Adam, by decoupling weight decay from the gradient update. To see this, regularization in Adam is usually implemented with the below modification where is the rate of the weight decay at time :
while AdamW adjusts the weight decay term to appear in the gradient update: