AICurious Logo

What is: AdaSqrt?

SourceSecond-order Information in First-order Optimization Methods
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

AdaSqrt is a stochastic optimization technique that is motivated by the observation that methods like Adagrad and Adam can be viewed as relaxations of Natural Gradient Descent.

The updates are performed as follows:

tt+1t \leftarrow t + 1

α_tt\alpha\_{t} \leftarrow \sqrt{t}

g_t_θf(θ_t1)g\_{t} \leftarrow \nabla\_{\theta}f\left(\theta\_{t-1}\right)

S_tS_t1+g_t2S\_{t} \leftarrow S\_{t-1} + g\_{t}^{2}

θ_t+1θ_t+ηα_tg_tS_t+ϵ\theta\_{t+1} \leftarrow \theta\_{t} + \eta\frac{\alpha\_{t}g\_{t}}{S\_{t} + \epsilon}