Gaussian Process Meta Few-shot Classifier Learning via Linear Discriminant Laplace Approximation

The meta learning few-shot classification is an emerging problem in machine learning that received enormous attention recently, where the goal is to learn a model that can quickly adapt to a new task with only a few labeled data. We consider the Bayesian Gaussian process (GP) approach, in which we meta-learn the GP prior, and the adaptation to a new task is carried out by the GP predictive model from the posterior inference. We adopt the Laplace posterior approximation, but to circumvent the iterative gradient steps for finding the MAP solution, we introduce a novel linear discriminant analysis (LDA) plugin as a surrogate for the MAP solution. In essence, the MAP solution is approximated by the LDA estimate, but to take the GP prior into account, we adopt the prior-norm adjustment to estimate LDA’s shared variance parameters, which ensures that the adjusted estimate is consistent with the GP prior. This enables closed-form differentiable GP posteriors and predictive distributions, thus allowing fast meta training. We demonstrate considerable improvement over the previous approaches.

[1]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[3]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[4]  Sebastian Nowozin,et al.  Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[5]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[9]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[10]  Amos Storkey,et al.  Learning to learn via Self-Critique , 2019, ArXiv.

[11]  Yan Wang,et al.  SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning , 2019, ArXiv.

[12]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[13]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[14]  Yee Whye Teh,et al.  MetaFun: Meta-Learning with Iterative Functional Updates , 2020, ICML.

[15]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[16]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[17]  Thomas L. Griffiths,et al.  Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[18]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[19]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[20]  T. Griffiths,et al.  Probabilistic inference in human semantic memory , 2006, Trends in Cognitive Sciences.

[21]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[22]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[23]  Zheng Zhang,et al.  Negative Margin Matters: Understanding Margin in Few-shot Classification , 2020, ECCV.

[24]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[25]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[26]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[27]  Wittawat Jitkrittum,et al.  Large sample analysis of the median heuristic , 2017, 1707.07269.

[28]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[30]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Elliot J. Crowley,et al.  Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels , 2020, NeurIPS.

[32]  Hung-Yu Tseng,et al.  Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation , 2020, ICLR.

[33]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[34]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[35]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[36]  Fei Sha,et al.  Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[38]  Aske Plaat,et al.  A survey of deep meta-learning , 2020, Artificial Intelligence Review.

[39]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[40]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[41]  Timothy Hospedales,et al.  Shallow Bayesian Meta Learning for Real-World Few-Shot Recognition , 2021, ArXiv.