论文信息 - An Ensemble of Epoch-Wise Empirical Bayes for Few-Shot Learning

An Ensemble of Epoch-Wise Empirical Bayes for Few-Shot Learning

Few-shot learning aims to train efficient predictive models with a few examples. The lack of training data leads to poor models that perform high-variance or low-confidence predictions. In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions. "Epoch-wise" means that each training epoch has a Bayes model whose parameters are specifically learned and deployed. "Empirical" means that the hyperparameters, e.g., used for learning and ensembling the epoch-wise models, are generated by hyperprior learners conditional on task-specific data. We introduce four kinds of hyperprior learners by considering inductive vs. transductive, and epoch-dependent vs. epoch-independent, in the paradigm of meta-learning. We conduct extensive experiments for five-class few-shot tasks on three challenging benchmarks: miniImageNet, tieredImageNet, and FC100, and achieve top performance using the epoch-dependent transductive hyperprior learner, which captures the richest information. Our ablation study shows that both "epoch-wise ensemble" and "empirical" encourage high efficiency and robustness in the model performance.

[1] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.

[2] Seong-Whan Lee,et al. Few-Shot Learning With Geometric Constraints , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[3] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[4] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[6] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[7] Joan Bruna,et al. Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[8] Bernt Schiele,et al. Learning to Self-Train for Semi-Supervised Few-Shot Classification , 2019, NeurIPS.

[9] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[10] Kilian Q. Weinberger,et al. Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[11] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[12] Feiyue Huang,et al. LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning , 2019, ICML.

[13] Guosheng Lin,et al. DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning , 2020 .

[14] Fatih Porikli,et al. A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning , 2017, IEEE Transactions on Image Processing.

[15] Subhransu Maji,et al. Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Shimon Ullman,et al. Cross-generalization: learning novel classes from a single example by feature replacement , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[18] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[19] Justin Domke,et al. Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[20] Fatos T. Yarman Vural,et al. A New Fuzzy Stacked Generalization Technique and Analysis of its Performance , 2012, 1204.0171.

[21] Joshua B. Tenenbaum,et al. Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[22] Leo Breiman,et al. Stacked regressions , 2004, Machine Learning.

[23] J. Friedman. Stochastic gradient boosting , 2002 .

[24] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.

[25] Martial Hebert,et al. Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Guosheng Lin,et al. DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jascha Sohl-Dickstein,et al. Meta-Learning Update Rules for Unsupervised Representation Learning , 2018, ICLR.

[28] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[29] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[30] Xiangyang Xue,et al. Multi-Level Semantic Feature Augmentation for One-Shot Learning , 2018, IEEE Transactions on Image Processing.

[31] Taesup Kim,et al. Edge-Labeling Graph Neural Network for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Yu-Chiang Frank Wang,et al. A Closer Look at Few-shot Classification , 2019, ICLR.

[33] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[34] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[35] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[36] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[37] Bernt Schiele,et al. A Domain Based Approach to Social Relation Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Seungjin Choi,et al. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[39] Padhraic Smyth,et al. Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[40] Bernt Schiele,et al. Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Bernt Schiele,et al. Mnemonics Training: Multi-Class Incremental Learning Without Forgetting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Trevor Darrell,et al. Frustratingly Simple Few-Shot Object Detection , 2020, ICML.

[45] Rich Caruana,et al. Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[46] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[47] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[48] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[49] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[50] Xilin Chen,et al. Cross Attention Network for Few-shot Classification , 2019, NeurIPS.

[51] Lars Schmidt-Thieme,et al. Beyond Manual Tuning of Hyperparameters , 2015, KI - Künstliche Intelligenz.

[52] Martial Hebert,et al. Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[53] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[56] Xiaogang Wang,et al. Finding Task-Relevant Features for Few-Shot Learning by Category Traversal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Tapani Raiko,et al. Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters , 2015, ICML.

[58] Chi Zhang,et al. Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[60] Alexandre Lacoste,et al. TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[61] Eunho Yang,et al. Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[62] Mark J. van der Laan,et al. The relative performance of ensemble methods with deep convolutional neural networks for image classification , 2017, Journal of applied statistics.

[63] Rui Yao,et al. CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Xuming He,et al. A Dual Attention Network with Semantic Embedding for Few-Shot Learning , 2019, AAAI.

[65] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[66] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[67] Paolo Frasconi,et al. Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[68] Anders Krogh,et al. Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[69] Wei Shen,et al. Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[71] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[72] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[73] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[74] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[75] Luc Van Gool,et al. Natural and Effective Obfuscation by Head Inpainting , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[77] Tsendsuren Munkhdalai,et al. Rapid Adaptation with Conditionally Shifted Neurons , 2017, ICML.

[78] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[79] Jasper Snoek,et al. Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[80] Pablo G. Moreno,et al. Empirical Bayes Meta-Learning with Synthetic Gradients , 2019 .

[81] Ming-Ming Cheng,et al. Nonlinear Regression via Deep Negative Correlation Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82] Bernt Schiele,et al. Meta-Transfer Learning Through Hard Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[84] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[85] Hong Yu,et al. Meta Networks , 2017, ICML.

[86] Richard J. Mammone,et al. Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[87] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[88] Yoshua Bengio,et al. MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[89] Cordelia Schmid,et al. Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[90] Geoffrey E. Hinton. Using fast weights to deblur old memories , 1987 .

[91] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[92] Bernt Schiele,et al. F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[93] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.