AICurious Logo

What is: SGDW?

SourceDecoupled Weight Decay Regularization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

SGDW is a stochastic optimization technique that decouples weight decay from the gradient update:

g_t=f_t(θ_t1)+λθ_t1 g\_{t} = \nabla{f\_{t}}\left(\theta\_{t-1}\right) + \lambda\theta\_{t-1}

m_t=β_1m_t1+η_tαg_t m\_{t} = \beta\_{1}m\_{t-1} + \eta\_{t}\alpha{g}\_{t}

θ_t=θ_t1m_tη_tλθ_t1 \theta\_{t} = \theta\_{t-1} - m\_{t} - \eta\_{t}\lambda\theta\_{t-1}