Meta-Learning with Shared Amortized Variational Inference

We propose a novel amortized variational inference scheme for an empirical Bayes meta-learning model, where model parameters are treated as latent variables. We learn the prior distribution over model parameters conditioned on limited training data using a variational autoencoder approach. Our framework proposes sharing the same amortized inference network between the conditional prior and variational posterior distributions over the model parameters. While the posterior leverages both the labeled support and query data, the conditional prior is based only on the labeled support data. We show that in earlier work, relying on Monte-Carlo approximation, the conditional prior collapses to a Dirac delta function. In contrast, our variational approach prevents this collapse and preserves uncertainty over the model parameters. We evaluate our approach on the miniImageNet, CIFAR-FS and FC100 datasets, and present results demonstrating its advantages over previous work.

[1]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[3]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[4]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[5]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Xiaogang Wang,et al.  Finding Task-Relevant Features for Few-Shot Learning by Category Traversal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[8]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[9]  Patrick Pérez,et al.  Boosting Few-Shot Visual Learning With Self-Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[12]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[13]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[14]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[15]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[17]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[18]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[19]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[20]  Tsendsuren Munkhdalai,et al.  Rapid Adaptation with Conditionally Shifted Neurons , 2017, ICML.

[21]  Luca Bertinetto,et al.  Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[22]  Stefano Soatto,et al.  A Baseline for Few-Shot Image Classification , 2019, ICLR.

[23]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[24]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Dmitry P. Vetrov,et al.  Bayesian Incremental Learning for Deep Neural Networks , 2018, ICLR.

[26]  Sebastian Nowozin,et al.  Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[27]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[29]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[30]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[31]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[32]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[33]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[34]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.

[37]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[38]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[39]  Eunho Yang,et al.  Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[40]  Neil D. Lawrence,et al.  Empirical Bayes Transductive Meta-Learning with Synthetic Gradients , 2020, ICLR.

[41]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[42]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[43]  Yee Whye Teh,et al.  Attentive Neural Processes , 2019, ICLR.

[44]  Yoshua Bengio,et al.  MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[45]  Cordelia Schmid,et al.  Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).