What is: Inverse Square Root Schedule?
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Inverse Square Root is a learning rate schedule 1 / where is the current training iteration and is the number of warm-up steps. This sets a constant learning rate for the first steps, then exponentially decays the learning rate until pre-training is over.