What is: Discriminative Fine-Tuning?
Source | Universal Language Model Fine-tuning for Text Classification |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Discriminative Fine-Tuning is a fine-tuning strategy that is used for ULMFiT type models. Instead of using the same learning rate for all layers of the model, discriminative fine-tuning allows us to tune each layer with different learning rates. For context, the regular stochastic gradient descent (SGD) update of a model’s parameters at time step looks like the following (Ruder, 2016):
where is the learning rate and is the gradient with regard to the model’s objective function. For discriminative fine-tuning, we split the parameters into {} where contains the parameters of the model at the -th layer and is the number of layers of the model. Similarly, we obtain {} where where is the learning rate of the -th layer. The SGD update with discriminative finetuning is then:
The authors find that empirically it worked well to first choose the learning rate of the last layer by fine-tuning only the last layer and using as the learning rate for lower layers.