AICurious Logo

What is: 1-bit Adam?

Source1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Data SourceCC BY-SA -

1-bit Adam is a stochastic optimization technique that is a variant of ADAM with error-compensated 1-bit compression, based on finding that Adam's variance term becomes stable at an early stage. First vanilla Adam is used for a few epochs as a warm-up. After the warm-up stage, the compression stage starts and we stop updating the variance term v\mathbf{v} and use it as a fixed precondition. At the compression stage, we communicate based on the momentum applied with error-compensated 1-bit compression. The momentums are quantized into 1-bit representation (the sign of each element). Accompanying the vector, a scaling factor is computed as  magnitude of compensated gradient  magnitude of quantized gradient \frac{\text { magnitude of compensated gradient }}{\text { magnitude of quantized gradient }}. This scaling factor ensures that the compressed momentum has the same magnitude as the uncompressed momentum. This 1-bit compression could reduce the communication cost by 97%97 \% and 94%94 \% compared to the original float 32 and float 16 training, respectively.