What is: 1-bit Adam?
Source | 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
1-bit Adam is a stochastic optimization technique that is a variant of ADAM with error-compensated 1-bit compression, based on finding that Adam's variance term becomes stable at an early stage. First vanilla Adam is used for a few epochs as a warm-up. After the warm-up stage, the compression stage starts and we stop updating the variance term and use it as a fixed precondition. At the compression stage, we communicate based on the momentum applied with error-compensated 1-bit compression. The momentums are quantized into 1-bit representation (the sign of each element). Accompanying the vector, a scaling factor is computed as . This scaling factor ensures that the compressed momentum has the same magnitude as the uncompressed momentum. This 1-bit compression could reduce the communication cost by and compared to the original float 32 and float 16 training, respectively.