Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

An important research direction in machine learning has centered around developing meta-learning algorithms to tackle few-shot learning. An especially successful algorithm has been Model Agnostic Meta-Learning (MAML), a method that consists of two optimization loops, with the outer loop finding a meta-initialization, from which the inner loop can efficiently learn new tasks. Despite MAML's popularity, a fundamental open question remains -- is the effectiveness of MAML due to the meta-initialization being primed for rapid learning (large, efficient changes in the representations) or due to feature reuse, with the meta initialization already containing high quality features? We investigate this question, via ablation studies and analysis of the latent representations, finding that feature reuse is the dominant factor. This leads to the ANIL (Almost No Inner Loop) algorithm, a simplification of MAML where we remove the inner loop for all but the (task-specific) head of a MAML-trained network. ANIL matches MAML's performance on benchmark few-shot image classification and RL and offers computational improvements over MAML. We further study the precise contributions of the head and body of the network, showing that performance on the test tasks is entirely determined by the quality of the learned features, and we can remove even the head of the network (the NIL algorithm). We conclude with a discussion of the rapid learning vs feature reuse question for meta-learning algorithms more broadly.

[1]  Hod Lipson,et al.  Convergent Learning: Do different neural networks learn the same representations? , 2015, FE@NIPS.

[2]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[3]  Yonatan Belinkov,et al.  Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[4]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[5]  Regina Barzilay,et al.  Few-shot Text Classification with Distributional Signatures , 2019, ICLR.

[6]  Samy Bengio,et al.  Insights on representational similarity in neural networks with canonical correlation , 2018, NeurIPS.

[7]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[8]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[9]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[10]  Sebastian Nowozin,et al.  Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[11]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[12]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[13]  Richard Socher,et al.  A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.

[14]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[15]  Samy Bengio,et al.  Transfusion: Understanding Transfer Learning with Applications to Medical Imaging , 2019, ArXiv.

[16]  Seungjin Choi,et al.  Meta-Learning with Adaptive Layerwise Metric and Subspace , 2018, ArXiv.

[17]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[18]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[19]  Sergey Levine,et al.  Unsupervised Learning via Meta-Learning , 2018, ICLR.

[20]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[21]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[22]  Adam Lopez,et al.  Understanding Learning Dynamics Of Language Models with SVCCA , 2018, NAACL.

[23]  Marco Pavone,et al.  Meta-Learning Priors for Efficient Online Bayesian Regression , 2018, WAFR.

[24]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[25]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[26]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[27]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[28]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[29]  Bin Wu,et al.  Deep Meta-Learning: Learning to Learn in the Concept Space , 2018, ArXiv.

[30]  Luca Bertinetto,et al.  Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[31]  Surya Ganguli,et al.  Universality and individuality in neural dynamics across large populations of recurrent networks , 2019, NeurIPS.

[32]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[33]  Jon Kleinberg,et al.  Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.

[34]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Katja Hofmann,et al.  CAML: Fast Context Adaptation via Meta-Learning , 2018, ArXiv.

[37]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[38]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.