AICurious Logo

What is: Demon?

SourceDemon: Improved Neural Network Training with Momentum Decay
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Decaying Momentum, or Demon, is a stochastic optimizer motivated by decaying the total contribution of a gradient to all future updates. By decaying the momentum parameter, the total contribution of a gradient to all future updates is decayed. A particular gradient term g_tg\_{t} contributes a total of η_iβi\eta\sum\_{i}\beta^{i} of its "energy" to all future gradient updates, and this results in the geometric sum, _i=1βi=β_i=0βi=β(1β)\sum^{\infty}\_{i=1}\beta^{i} = \beta\sum^{\infty}\_{i=0}\beta^{i} = \frac{\beta}{\left(1-\beta\right)}. Decaying this sum results in the Demon algorithm. Letting β_init\beta\_{init} be the initial β\beta; then at the current step tt with total TT steps, the decay routine is given by solving the below for β_t\beta\_{t}:

β_t(1β_t)=(1t/T)β_init/(1β_init) \frac{\beta\_{t}}{\left(1-\beta\_{t}\right)} = \left(1-t/T\right)\beta\_{init}/\left(1-\beta\_{init}\right)

Where (1t/T)\left(1-t/T\right) refers to the proportion of iterations remaining. Note that Demon typically requires no hyperparameter tuning as it is usually decayed to 00 or a small negative value at time TT. Improved performance is observed by delaying the decaying. Demon can be applied to any gradient descent algorithm with a momentum parameter.