Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning

As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness. However, the convergence of the general multi-step MAML still remains unexplored. In this paper, we develop a new theoretical framework to provide such convergence guarantee for two types of objective functions that are of interest in practice: (a) resampling case (e.g., reinforcement learning), where loss functions take the form in expectation and new data are sampled as the algorithm runs; and (b) finite-sum case (e.g., supervised learning), where loss functions take the finite-sum form with given samples. For both cases, we characterize the convergence rate and the computational complexity to attain an $\epsilon$-accurate solution for multi-step MAML in the general nonconvex setting. In particular, our results suggest that an inner-stage stepsize needs to be chosen inversely proportional to the number $N$ of inner-stage steps in order for $N$-step MAML to have guaranteed convergence. From the technical perspective, we develop novel techniques to deal with the nested structure of the meta gradient for multi-step MAML, which can be of independent interest.

[1]  Fei Chen,et al.  Federated Meta-Learning with Fast Convergence and Efficient Communication , 2018 .

[2]  H. Vincent Poor,et al.  Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters , 2020, NeurIPS.

[3]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[4]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[7]  Wenbo Gao,et al.  ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[8]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[9]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[10]  Boi Faltings,et al.  Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems , 2019, IJCAI.

[11]  Sanjeev Arora,et al.  Provable Representation Learning for Imitation Learning via Bi-level Optimization , 2020, ICML.

[12]  Thomas L. Griffiths,et al.  Online gradient-based mixtures for transfer modulation in meta-learning , 2018, ArXiv.

[13]  Yongseok Choi,et al.  Multi-step Estimation for Gradient-based Meta-learning , 2020, ArXiv.

[14]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[15]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Michael I. Jordan,et al.  Provable Meta-Learning of Linear Representations , 2020, ICML.

[18]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[19]  Krzysztof Choromanski,et al.  UFO-BLO: Unbiased First-Order Bilevel Optimization , 2020, ArXiv.

[20]  Massimiliano Pontil,et al.  Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.

[21]  Aryan Mokhtari,et al.  Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning , 2020, ArXiv.

[22]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[23]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[24]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[25]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[26]  Sham M. Kakade,et al.  Few-Shot Learning via Learning the Representation, Provably , 2020, ICLR.

[27]  Aryan Mokhtari,et al.  Distribution-Agnostic Model-Agnostic Meta-Learning , 2020, ArXiv.

[28]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[29]  Massimiliano Pontil,et al.  Incremental Learning-to-Learn with Statistical Guarantees , 2018, UAI.

[30]  Aryan Mokhtari,et al.  On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms , 2019, AISTATS.

[31]  Zhaoran Wang,et al.  On the Global Optimality of Model-Agnostic Meta-Learning , 2020, ICML.

[32]  Maria-Florina Balcan,et al.  Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.

[33]  Massimiliano Pontil,et al.  Learning To Learn Around A Common Mean , 2018, NeurIPS.

[34]  Haoxiang Wang,et al.  Global Convergence and Induced Kernels of Gradient-Based Meta-Learning with Neural Nets , 2020, ArXiv.

[35]  Pierre Alquier,et al.  Regret Bounds for Lifelong Learning , 2016, AISTATS.

[36]  Shimon Whiteson,et al.  DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.

[37]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[38]  Shuicheng Yan,et al.  Efficient Meta Learning via Minibatch Proximal Update , 2019, NeurIPS.

[39]  Zhenguo Li,et al.  Federated Meta-Learning for Recommendation , 2018, ArXiv.

[40]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[41]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[42]  Samy Bengio,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[43]  R. M. McLeod,et al.  Mean Value Theorems for Vector Valued Functions , 1965, Proceedings of the Edinburgh Mathematical Society.

[44]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[45]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[46]  Katja Hofmann,et al.  CAML: Fast Context Adaptation via Meta-Learning , 2018, ArXiv.

[47]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[48]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[49]  Tamim Asfour,et al.  ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[50]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.