AICurious Logo

What is: Non-monotonically Triggered ASGD?

SourceRegularizing and Optimizing LSTM Language Models
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

NT-ASGD, or Non-monotonically Triggered ASGD, is an averaged stochastic gradient descent technique.

In regular ASGD, we take steps identical to regular SGD but instead of returning the last iterate as the solution, we return 1(KT+1)T_i=Tw_i\frac{1}{\left(K-T+1\right)}\sum^{T}\_{i=T}w\_{i}, where KK is the total number of iterations and T<KT < K is a user-specified averaging trigger.

NT-ASGD has a non-monotonic criterion that conservatively triggers the averaging when the validation metric fails to improve for multiple cycles. Given that the choice of triggering is irreversible, this conservatism ensures that the randomness of training does not play a major role in the decision.