AICurious Logo

What is: AdaMod?

SourceAn Adaptive and Momental Bound Method for Stochastic Learning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

AdaMod is a stochastic optimizer that restricts adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

The weight updates are performed as:

g_t=f_t(θ_t1)g\_{t} = \nabla{f}\_{t}\left(\theta\_{t-1}\right)

m_t=β_1m_t1+(1β_1)g_tm\_{t} = \beta\_{1}m\_{t-1} + \left(1-\beta\_{1}\right)g\_{t}

v_t=β_2v_t1+(1β_2)g_t2v\_{t} = \beta\_{2}v\_{t-1} + \left(1-\beta\_{2}\right)g\_{t}^{2}

m^_t=m_t/(1βt_1) \hat{m}\_{t} = m\_{t} / \left(1 - \beta^{t}\_{1}\right)

v^_t=v_t/(1βt_2) \hat{v}\_{t} = v\_{t} / \left(1 - \beta^{t}\_{2}\right)

η_t=α_t/(v^_t+ϵ)\eta\_{t} = \alpha\_{t} / \left(\sqrt{\hat{v}\_{t}} + \epsilon\right)

s_t=β_3s_t1+(1β_3)η_ts\_{t} = \beta\_{3}s\_{t-1} + (1-\beta\_{3})\eta\_{t}

η^_t=min(η_t,s_t)\hat{\eta}\_{t} = \text{min}\left(\eta\_{t}, s\_{t}\right)

θ_t=θ_t1η^_tm^_t\theta\_{t} = \theta\_{t-1} - \hat{\eta}\_{t}\hat{m}\_{t}