论文信息 - Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels

Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels

Recently, different machine learning methods have been introduced to tackle the challenging few-shot learning scenario that is, learning from a small labeled dataset related to a specific task. Common approaches have taken the form of meta-learning: learning to learn on the new problem given the old. Following the recognition that meta-learning is implementing learning in a multi-level model, we present a Bayesian treatment for the meta-learning inner loop through the use of deep kernels. As a result we can learn a kernel that transfers to new tasks; we call this Deep Kernel Transfer (DKT). This approach has many advantages: is straightforward to implement as a single optimizer, provides uncertainty quantification, and does not require estimation of task-specific parameters. We empirically demonstrate that DKT outperforms several state-of-the-art algorithms in few-shot classification, and is the state of the art for cross-domain adaptation and regression. We conclude that complex meta-learning routines can be replaced by a simpler Bayesian model without loss of accuracy.

Elliot J. Crowley | A. Storkey | M. O’Boyle | Massimiliano Patacchiola | Jack Turner

[1] Timothy M. Hospedales,et al. Meta-Learning in Neural Networks: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Amos Storkey,et al. Defining Benchmarks for Continual Few-Shot Learning , 2020, ArXiv.

[3] Adam R. Kosiorek,et al. MetaFun: Meta-Learning with Iterative Functional Updates , 2019, ICML.

[4] Sergey Levine,et al. Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[5] Alexandre Lacoste,et al. Adaptive Deep Kernel Learning , 2019, ArXiv.

[6] Amos Storkey,et al. Learning to Learn By Self-Critique , 2019, NeurIPS.

[7] Yu-Chiang Frank Wang,et al. A Closer Look at Few-shot Classification , 2019, ICLR.

[8] Thomas L. Griffiths,et al. Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[9] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.

[10] Sebastian Nowozin,et al. Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[11] Luca Bertinetto,et al. Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[12] Learning Embedding Adaptation for Few-Shot Learning , 2018, ArXiv.

[13] Andrew Gordon Wilson,et al. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[14] Marco Pavone,et al. Meta-Learning Priors for Efficient Online Bayesian Regression , 2018, WAFR.

[15] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[16] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[17] Alexandre Lacoste,et al. TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[18] Artëm Yankov,et al. Few-Shot Learning with Metric-Agnostic Conditional Embeddings , 2018, ArXiv.

[19] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[20] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[22] Wei Shen,et al. Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[24] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[25] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[26] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[27] Gregory Cohen,et al. EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[28] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[29] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.

[32] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[33] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[34] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[36] Andrew Gordon Wilson,et al. Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[37] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[38] Charles Kemp,et al. How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[39] Joshua B. Tenenbaum,et al. One shot learning of simple visual concepts , 2011, CogSci.

[40] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[41] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[42] Geoffrey E. Hinton,et al. Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[43] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .

[44] T. Griffiths,et al. Probabilistic inference in human semantic memory , 2006, Trends in Cognitive Sciences.

[45] Malte Kuß,et al. Gaussian process models for robust regression, classification, and reinforcement learning , 2006 .

[46] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[47] Shaogang Gong,et al. An investigation into face pose distributions , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[48] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.