Viet-Anh on Software Logo

What is: Model-Agnostic Meta-Learning?

SourceModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

MAML, or Model-Agnostic Meta-Learning, is a model and task-agnostic algorithm for meta-learning that trains a model’s parameters such that a small number of gradient updates will lead to fast learning on a new task.

Consider a model represented by a parametrized function f_θf\_{\theta} with parameters θ\theta. When adapting to a new task T_i\mathcal{T}\_{i}, the model’s parameters θ\theta become θ_i\theta'\_{i}. With MAML, the updated parameter vector θ_i\theta'\_{i} is computed using one or more gradient descent updates on task T_i\mathcal{T}\_{i}. For example, when using one gradient update,

θ_i=θα_θL_T_i(f_θ)\theta'\_{i} = \theta - \alpha\nabla\_{\theta}\mathcal{L}\_{\mathcal{T}\_{i}}\left(f\_{\theta}\right)

The step size α\alpha may be fixed as a hyperparameter or metalearned. The model parameters are trained by optimizing for the performance of f_θ_if\_{\theta'\_{i}} with respect to θ\theta across tasks sampled from p(T_i)p\left(\mathcal{T}\_{i}\right). More concretely the meta-objective is as follows:

min_θ_T_ip(T)L_T_i(f_θ_i)=_T_ip(T)L_T_i(f_θα_θL_T_i(f_θ))\min\_{\theta} \sum\_{\mathcal{T}\_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}\_{\mathcal{T\_{i}}}\left(f\_{\theta'\_{i}}\right) = \sum\_{\mathcal{T}\_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}\_{\mathcal{T\_{i}}}\left(f\_{\theta - \alpha\nabla\_{\theta}\mathcal{L}\_{\mathcal{T}\_{i}}\left(f\_{\theta}\right)}\right)

Note that the meta-optimization is performed over the model parameters θ\theta, whereas the objective is computed using the updated model parameters θ\theta'. In effect MAML aims to optimize the model parameters such that one or a small number of gradient steps on a new task will produce maximally effective behavior on that task. The meta-optimization across tasks is performed via stochastic gradient descent (SGD), such that the model parameters θ\theta are updated as follows:

θθβ_θ_T_ip(T)L_T_i(f_θ_i) \theta \leftarrow \theta - \beta\nabla\_{\theta} \sum\_{\mathcal{T}\_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}\_{\mathcal{T\_{i}}}\left(f\_{\theta'\_{i}}\right)

where β\beta is the meta step size.