Probabilistic Model-Agnostic Meta-Learning

Meta-learning for few-shot learning entails acquiring a prior over previous tasks and experiences, such that new tasks be learned from small amounts of data. However, a critical challenge in few-shot learning is task ambiguity: even when a powerful prior can be meta-learned from a large number of prior tasks, a small dataset for a new task can simply be too ambiguous to acquire a single model (e.g., a classifier) for that task that is accurate. In this paper, we propose a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution. Our approach extends model-agnostic meta-learning, which adapts to new tasks via gradient descent, to incorporate a parameter distribution that is trained via a variational lower bound. At meta-test time, our algorithm adapts via a simple procedure that injects noise into gradient descent, and at meta-training time, the model is trained such that this stochastic adaptation procedure produces samples from the approximate model posterior. Our experimental results show that our method can sample plausible classifiers and regressors in ambiguous few-shot learning problems. We also show how reasoning about ambiguity can also be used for downstream active learning problems.

[1]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[3]  Joan Bruna,et al.  Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[4]  Reginaldo J. Santos Equivalence of regularization and truncated iteration for general ill-posed problems☆ , 1996 .

[5]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[6]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[7]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[8]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[9]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[10]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[11]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[12]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[13]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[16]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[17]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[18]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[19]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[20]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[21]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[22]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[23]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  J. Tenenbaum A Bayesian framework for concept learning , 1999 .

[25]  Shannon L. Risacher,et al.  Sparse Bayesian multi-task learning for predicting cognitive outcomes from neuroimaging measures in Alzheimer's disease , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[27]  Oriol Vinyals,et al.  Bayesian Recurrent Neural Networks , 2017, ArXiv.

[28]  David Barber,et al.  Ensemble Learning for Multi-Layer Networks , 1997, NIPS.

[29]  Alexandre Lacoste,et al.  Deep Prior , 2017, ArXiv.

[30]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[31]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[32]  Charles Blundell,et al.  Early Visual Concept Learning with Unsupervised Deep Learning , 2016, ArXiv.

[33]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[34]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[35]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[36]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[37]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[38]  Daniel A. Braun,et al.  Structure learning in action , 2010, Behavioural Brain Research.

[39]  Mykel J. Kochenderfer,et al.  Amortized Inference Regularization , 2018, NeurIPS.

[40]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[41]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[42]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.