An Ensemble of Epoch-Wise Empirical Bayes for Few-Shot Learning

Few-shot learning aims to train efficient predictive models with a few examples. The lack of training data leads to poor models that perform high-variance or low-confidence predictions. In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions. "Epoch-wise" means that each training epoch has a Bayes model whose parameters are specifically learned and deployed. "Empirical" means that the hyperparameters, e.g., used for learning and ensembling the epoch-wise models, are generated by hyperprior learners conditional on task-specific data. We introduce four kinds of hyperprior learners by considering inductive vs. transductive, and epoch-dependent vs. epoch-independent, in the paradigm of meta-learning. We conduct extensive experiments for five-class few-shot tasks on three challenging benchmarks: miniImageNet, tieredImageNet, and FC100, and achieve top performance using the epoch-dependent transductive hyperprior learner, which captures the richest information. Our ablation study shows that both "epoch-wise ensemble" and "empirical" encourage high efficiency and robustness in the model performance.

[1]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[2]  Seong-Whan Lee,et al.  Few-Shot Learning With Geometric Constraints , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[4]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[6]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[7]  Joan Bruna,et al.  Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[8]  Bernt Schiele,et al.  Learning to Self-Train for Semi-Supervised Few-Shot Classification , 2019, NeurIPS.

[9]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[10]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[11]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[12]  Feiyue Huang,et al.  LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning , 2019, ICML.

[13]  Guosheng Lin,et al.  DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning , 2020 .

[14]  Fatih Porikli,et al.  A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning , 2017, IEEE Transactions on Image Processing.

[15]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Shimon Ullman,et al.  Cross-generalization: learning novel classes from a single example by feature replacement , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[18]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[19]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[20]  Fatos T. Yarman Vural,et al.  A New Fuzzy Stacked Generalization Technique and Analysis of its Performance , 2012, 1204.0171.

[21]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[22]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[23]  J. Friedman Stochastic gradient boosting , 2002 .

[24]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[25]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Guosheng Lin,et al.  DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jascha Sohl-Dickstein,et al.  Meta-Learning Update Rules for Unsupervised Representation Learning , 2018, ICLR.

[28]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[29]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[30]  Xiangyang Xue,et al.  Multi-Level Semantic Feature Augmentation for One-Shot Learning , 2018, IEEE Transactions on Image Processing.

[31]  Taesup Kim,et al.  Edge-Labeling Graph Neural Network for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[33]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[36]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[37]  Bernt Schiele,et al.  A Domain Based Approach to Social Relation Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[39]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[40]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Bernt Schiele,et al.  Mnemonics Training: Multi-Class Incremental Learning Without Forgetting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Trevor Darrell,et al.  Frustratingly Simple Few-Shot Object Detection , 2020, ICML.

[45]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[46]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[47]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[48]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[49]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[50]  Xilin Chen,et al.  Cross Attention Network for Few-shot Classification , 2019, NeurIPS.

[51]  Lars Schmidt-Thieme,et al.  Beyond Manual Tuning of Hyperparameters , 2015, KI - Künstliche Intelligenz.

[52]  Martial Hebert,et al.  Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[53]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[56]  Xiaogang Wang,et al.  Finding Task-Relevant Features for Few-Shot Learning by Category Traversal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Tapani Raiko,et al.  Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters , 2015, ICML.

[58]  Chi Zhang,et al.  Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[60]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[61]  Eunho Yang,et al.  Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[62]  Mark J. van der Laan,et al.  The relative performance of ensemble methods with deep convolutional neural networks for image classification , 2017, Journal of applied statistics.

[63]  Rui Yao,et al.  CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Xuming He,et al.  A Dual Attention Network with Semantic Embedding for Few-Shot Learning , 2019, AAAI.

[65]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[66]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[67]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[68]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[69]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[71]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[72]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[73]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[74]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[75]  Luc Van Gool,et al.  Natural and Effective Obfuscation by Head Inpainting , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[77]  Tsendsuren Munkhdalai,et al.  Rapid Adaptation with Conditionally Shifted Neurons , 2017, ICML.

[78]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[79]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[80]  Pablo G. Moreno,et al.  Empirical Bayes Meta-Learning with Synthetic Gradients , 2019 .

[81]  Ming-Ming Cheng,et al.  Nonlinear Regression via Deep Negative Correlation Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  Bernt Schiele,et al.  Meta-Transfer Learning Through Hard Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[84]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[85]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[86]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[87]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[88]  Yoshua Bengio,et al.  MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[89]  Cordelia Schmid,et al.  Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[90]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[91]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[92]  Bernt Schiele,et al.  F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.