Decision-Theoretic Meta-Learning: Versatile and Efficient Amortization of Few-Shot Learning

This paper develops a general framework for data efficient and versatile deep learning. The new framework comprises three elements: 1) Discriminative probabilistic models from multi-task learning that leverage shared statistical information across tasks. 2) A novel Bayesian decision theoretic approach to meta-learning probabilistic inference across many tasks. 3) A fast, flexible, and simple to train amortization network that can automatically generalize and extrapolate to a wide range of settings. The VERSA algorithm, a particular instance of the framework, is evaluated on a suite of supervised few-shot learning tasks. VERSA achieves state-of-the-art performance in one-shot learning on Omniglot and miniImagenet, and produces compelling results on a one-shot ShapeNet view reconstruction challenge.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[4]  A. Dawid The geometry of proper scoring rules , 2007 .

[5]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[6]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[7]  Joshua B. Tenenbaum,et al.  One shot learning of simple visual concepts , 2011, CogSci.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[10]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[11]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[12]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[13]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[16]  Bernhard Schölkopf,et al.  Discriminative k-shot learning using probabilistic models , 2017, ArXiv.

[17]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[18]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[19]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[20]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[21]  T. Minka Discriminative models, not discriminative training , 2005 .

[22]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[24]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[25]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[26]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[27]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[28]  Zoubin Ghahramani,et al.  Approximate inference for the loss-calibrated Bayesian , 2011, AISTATS.

[29]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[30]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[31]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[32]  Richard E. Turner,et al.  Overpruning in Variational Bayesian Neural Networks , 2018, 1801.06230.

[33]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[35]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[36]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.