AICurious Logo

What is: Inverse Square Root Schedule?

Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Inverse Square Root is a learning rate schedule 1 / max(n,k)\sqrt{\max\left(n, k\right)} where nn is the current training iteration and kk is the number of warm-up steps. This sets a constant learning rate for the first kk steps, then exponentially decays the learning rate until pre-training is over.