论文信息 - Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning

Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning

As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness. However, the convergence of the general multi-step MAML still remains unexplored. In this paper, we develop a new theoretical framework to provide such convergence guarantee for two types of objective functions that are of interest in practice: (a) resampling case (e.g., reinforcement learning), where loss functions take the form in expectation and new data are sampled as the algorithm runs; and (b) finite-sum case (e.g., supervised learning), where loss functions take the finite-sum form with given samples. For both cases, we characterize the convergence rate and the computational complexity to attain an $\epsilon$-accurate solution for multi-step MAML in the general nonconvex setting. In particular, our results suggest that an inner-stage stepsize needs to be chosen inversely proportional to the number $N$ of inner-stage steps in order for $N$-step MAML to have guaranteed convergence. From the technical perspective, we develop novel techniques to deal with the nested structure of the meta gradient for multi-step MAML, which can be of independent interest.

Junjie Yang | Yingbin Liang | Kaiyi Ji

[1] Fei Chen,et al. Federated Meta-Learning with Fast Convergence and Efficient Communication , 2018 .

[2] H. Vincent Poor,et al. Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters , 2020, NeurIPS.

[3] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.

[4] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[5] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.

[7] Wenbo Gao,et al. ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[8] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[9] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[10] Boi Faltings,et al. Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems , 2019, IJCAI.

[11] Sanjeev Arora,et al. Provable Representation Learning for Imitation Learning via Bi-level Optimization , 2020, ICML.

[12] Thomas L. Griffiths,et al. Online gradient-based mixtures for transfer modulation in meta-learning , 2018, ArXiv.

[13] Yongseok Choi,et al. Multi-step Estimation for Gradient-based Meta-learning , 2020, ArXiv.

[14] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[15] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Michael I. Jordan,et al. Provable Meta-Learning of Linear Representations , 2020, ICML.

[18] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[19] Krzysztof Choromanski,et al. UFO-BLO: Unbiased First-Order Bilevel Optimization , 2020, ArXiv.

[20] Massimiliano Pontil,et al. Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.

[21] Aryan Mokhtari,et al. Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning , 2020, ArXiv.

[22] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[23] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[24] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[25] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .

[26] Sham M. Kakade,et al. Few-Shot Learning via Learning the Representation, Provably , 2020, ICLR.

[27] Aryan Mokhtari,et al. Distribution-Agnostic Model-Agnostic Meta-Learning , 2020, ArXiv.

[28] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[29] Massimiliano Pontil,et al. Incremental Learning-to-Learn with Statistical Guarantees , 2018, UAI.

[30] Aryan Mokhtari,et al. On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms , 2019, AISTATS.

[31] Zhaoran Wang,et al. On the Global Optimality of Model-Agnostic Meta-Learning , 2020, ICML.

[32] Maria-Florina Balcan,et al. Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.

[33] Massimiliano Pontil,et al. Learning To Learn Around A Common Mean , 2018, NeurIPS.

[34] Haoxiang Wang,et al. Global Convergence and Induced Kernels of Gradient-Based Meta-Learning with Neural Nets , 2020, ArXiv.

[35] Pierre Alquier,et al. Regret Bounds for Lifelong Learning , 2016, AISTATS.

[36] Shimon Whiteson,et al. DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.

[37] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[38] Shuicheng Yan,et al. Efficient Meta Learning via Minibatch Proximal Update , 2019, NeurIPS.

[39] Zhenguo Li,et al. Federated Meta-Learning for Recommendation , 2018, ArXiv.

[40] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[41] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[42] Samy Bengio,et al. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[43] R. M. McLeod,et al. Mean Value Theorems for Vector Valued Functions , 1965, Proceedings of the Edinburgh Mathematical Society.

[44] Hong Yu,et al. Meta Networks , 2017, ICML.

[45] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[46] Katja Hofmann,et al. CAML: Fast Context Adaptation via Meta-Learning , 2018, ArXiv.

[47] Richard J. Mammone,et al. Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[48] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[49] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[50] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.