Multi-Step Model-Agnostic Meta-Learning: Convergence and Improved Algorithms

As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness. However, the convergence of the general multi-step MAML still remains unexplored. In this paper, we develop a new theoretical framework, under which we characterize the convergence rate and the computational complexity of multi-step MAML. Our results indicate that N -step MAML attains the convergence with linearly increasing complexity with N under a properly chosen inner stepsize. We then take a further step to develop a more efficient Hessian-free MAML. We first show that the existing zeroth-order Hessian estimator contains a constant-level estimation error so that the MAML algorithm can perform unstably. To address this issue, we propose a novel Hessian estimator via a gradient-based Gaussian smoothing method, and show that it achieves a much smaller estimation bias and variance, and the resulting algorithm achieves the same performance guarantee as the original MAML under mild conditions. Our experiments validate our theory and demonstrate the effectiveness of the proposed Hessian estimator.

[1]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[2]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[3]  Aryan Mokhtari,et al.  On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms , 2019, AISTATS.

[4]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[5]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[6]  Aryan Mokhtari,et al.  Distribution-Agnostic Model-Agnostic Meta-Learning , 2020, ArXiv.

[7]  Haishan Ye,et al.  Hessian-Aware Zeroth-Order Optimization for Black-Box Adversarial Attack , 2018, ArXiv.

[8]  Massimiliano Pontil,et al.  Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.

[9]  Richard Socher,et al.  Taming MAML: Efficient unbiased meta-reinforcement learning , 2019, ICML.

[10]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[11]  Thomas L. Griffiths,et al.  Online gradient-based mixtures for transfer modulation in meta-learning , 2018, ArXiv.

[12]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[13]  T. M. Flett Mean value theorems for vector-valued functions , 1972 .

[14]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[15]  Aryan Mokhtari,et al.  Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning , 2020, ArXiv.

[16]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[17]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[18]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[19]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[20]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Boi Faltings,et al.  Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems , 2019, IJCAI.

[23]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[24]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[25]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[26]  Massimiliano Pontil,et al.  Incremental Learning-to-Learn with Statistical Guarantees , 2018, UAI.

[27]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[28]  Wenbo Gao,et al.  ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[29]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[30]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[31]  Zhenguo Li,et al.  Federated Meta-Learning for Recommendation , 2018, ArXiv.

[32]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[33]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[34]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[35]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[36]  Maria-Florina Balcan,et al.  Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.

[37]  Massimiliano Pontil,et al.  Learning To Learn Around A Common Mean , 2018, NeurIPS.

[38]  Tamim Asfour,et al.  ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[39]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[40]  Shuicheng Yan,et al.  Efficient Meta Learning via Minibatch Proximal Update , 2019, NeurIPS.

[41]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[42]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[43]  Katja Hofmann,et al.  CAML: Fast Context Adaptation via Meta-Learning , 2018, ArXiv.

[44]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[45]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[46]  Pierre Alquier,et al.  Regret Bounds for Lifelong Learning , 2016, AISTATS.

[47]  Shimon Whiteson,et al.  DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.