What is: Model-Agnostic Meta-Learning?

MAML, or Model-Agnostic Meta-Learning, is a model and task-agnostic algorithm for meta-learning that trains a model’s parameters such that a small number of gradient updates will lead to fast learning on a new task.

Consider a model represented by a parametrized function $f\_{\theta}$ with parameters $\theta$ . When adapting to a new task $\mathcal{T}\_{i}$ , the model’s parameters $\theta$ become $\theta'\_{i}$ . With MAML, the updated parameter vector $\theta'\_{i}$ is computed using one or more gradient descent updates on task $\mathcal{T}\_{i}$ . For example, when using one gradient update,

$\theta'\_{i} = \theta - \alpha\nabla\_{\theta}\mathcal{L}\_{\mathcal{T}\_{i}}\left(f\_{\theta}\right)$

The step size $\alpha$ may be fixed as a hyperparameter or metalearned. The model parameters are trained by optimizing for the performance of $f\_{\theta'\_{i}}$ with respect to $\theta$ across tasks sampled from $p\left(\mathcal{T}\_{i}\right)$ . More concretely the meta-objective is as follows:

$\min\_{\theta} \sum\_{\mathcal{T}\_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}\_{\mathcal{T\_{i}}}\left(f\_{\theta'\_{i}}\right) = \sum\_{\mathcal{T}\_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}\_{\mathcal{T\_{i}}}\left(f\_{\theta - \alpha\nabla\_{\theta}\mathcal{L}\_{\mathcal{T}\_{i}}\left(f\_{\theta}\right)}\right)$

Note that the meta-optimization is performed over the model parameters $\theta$ , whereas the objective is computed using the updated model parameters $\theta'$ . In effect MAML aims to optimize the model parameters such that one or a small number of gradient steps on a new task will produce maximally effective behavior on that task. The meta-optimization across tasks is performed via stochastic gradient descent (SGD), such that the model parameters $\theta$ are updated as follows:

$\theta \leftarrow \theta - \beta\nabla\_{\theta} \sum\_{\mathcal{T}\_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}\_{\mathcal{T\_{i}}}\left(f\_{\theta'\_{i}}\right)$

where $\beta$ is the meta step size.

Source	Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com