AICurious Logo

What is: AdaMax?

SourceAdam: A Method for Stochastic Optimization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

AdaMax is a generalisation of Adam from the l_2l\_{2} norm to the l_l\_{\infty} norm. Define:

u_t=β_2v_t1+(1β_2)g_t u\_{t} = \beta^{\infty}\_{2}v\_{t-1} + \left(1-\beta^{\infty}\_{2}\right)|g\_{t}|^{\infty}

=max(β_2v_t1,g_t) = \max\left(\beta\_{2}\cdot{v}\_{t-1}, |g\_{t}|\right)

We can plug into the Adam update equation by replacing v^t+ϵ\sqrt{\hat{v}_{t} + \epsilon} with u_tu\_{t} to obtain the AdaMax update rule:

θ_t+1=θ_tηu_tm^_t\theta\_{t+1} = \theta\_{t} - \frac{\eta}{u\_{t}}\hat{m}\_{t}

Common default values are η=0.002\eta = 0.002 and β_1=0.9\beta\_{1}=0.9 and β_2=0.999\beta\_{2}=0.999.