Consolidating the Meta-Learning Zoo: A Unifying Perspective as Posterior Predictive Inference

A plethora of methods and approaches combining meta-learning [19, 21] with deep neural networks have recently been proposed, achieving great success in applications such as few-shot learning. Much of the existing work may be characterized as either gradient-based [4, 17, 9], metric-based [22, 20], or amortized MAP based meta-learning [8, 16]. Due to the ubiquity of recent work, a unifying view is useful for understanding and improving these methods. Existing frameworks [5, 9] are limited to specific families of approaches, namely gradient-based methods [17, 4]. In this paper we develop a framework for meta-learning approximate probabilistic inference for prediction, or ML-PIP for short. ML-PIP provides this unifying perspective in terms of amortizing posterior predictive distributions. We show that ML-PIP re-frames and extends existing probabilistic interpretations of meta-learning [5, 9] to cover both point-estimates and variational posteriors, as well as a broader class of methods, including gradient based meta-learning [4, 17], metric based meta-learning [20], amortized MAP inference [16], and conditional probability modelling [6, 7].

[1]  Sebastian Nowozin,et al.  Decision-Theoretic Meta-Learning: Versatile and Efficient Amortization of Few-Shot Learning , 2018, ArXiv.

[2]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[3]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[4]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[5]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[6]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[9]  Tom Heskes,et al.  Empirical Bayes for Learning to Learn , 2000, ICML.

[10]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[11]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[12]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[13]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[14]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[15]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[16]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[17]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[18]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[20]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[21]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[22]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[23]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[24]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.