Poster
in
Workshop: Blog Track Poster Session
Strategies for Classification Layer Initialization in Model-Agnostic Meta-Learning
Nys Tjade Siegel · Thomas Goerttler · Klaus Obermayer
In a previous study, Raghu et al. [2020] found that in model-agnostic meta-learning (MAML) for few-shot classification, the majority of changes observed in the network during the inner loop fine-tuning process occurred in the linear classification head. It is commonly believed that during this phase, the linear head remaps encoded features to the classes of the new task. In traditional MAML, the weights of the final linear layer are meta-learned in the usual way. However, there are some issues with this approach:First, it is difficult to imagine that a single set of optimal weights can be learned. This becomes apparent when considering class label permutations: two different tasks may have the same classes but in a different order. As a result, the weights that perform well for the first task will likely not be effective for the second task. This is reflected in the fact that MAML’s performance can vary by up to 15% depending on the class label assignments during testing.Second, more challenging datasets such as Meta-Dataset are being proposed as few-shot learning benchmarks. These datasets have varying numbers of classes per task, making it impossible to learn a single set of weights for the classification layer.Therefore, it seems logical to consider how to initialize the final classification layer before fine-tuning on a new task. Random initialization may not be optimal, as it can introduce unnecessary noise.This blog post will discuss different approaches to the last layer initialization that claim to outperform the original MAML method.