Neural Ensemble Search for Uncertainty Estimation and Dataset Shift

Ensembles of neural networks achieve superior performance compared to stand-alone networks in terms of accuracy, uncertainty calibration and robustness to dataset shift. \emph{Deep ensembles}, a state-of-the-art method for uncertainty estimation, only ensemble random initializations of a \emph{fixed} architecture. Instead, we propose two methods for automatically constructing ensembles with \emph{varying} architectures, which implicitly trade-off individual architectures' strengths against the ensemble's diversity and exploit architectural variation as a source of diversity. On a variety of classification tasks and modern architecture search spaces, we show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift. Our further analysis and ablation studies provide evidence of higher ensemble diversity due to architectural variation, resulting in ensembles that can outperform deep ensembles, even when having weaker average base learners. To foster reproducibility, our code is available: \url{https://github.com/automl/nes}

[1]  Y. Teh,et al.  Bayesian Deep Ensembles via the Neural Tangent Kernel , 2020, NeurIPS.

[2]  Jasper Snoek,et al.  Hyperparameter Ensembles for Robustness and Uncertainty Quantification , 2020, NeurIPS.

[3]  Pavel Izmailov,et al.  Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.

[4]  Dmitry Vetrov,et al.  Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[5]  Julien N. Siems,et al.  NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search , 2020, ICLR.

[6]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.

[7]  Fabio Maria Carlucci,et al.  NAS evaluation is frustratingly hard , 2019, ICLR.

[8]  G. Coghill,et al.  ImmuNeCS: Neural Committee Search by an Artificial Immune System , 2019, ArXiv.

[9]  Huanhuan Chen,et al.  When Does Diversity Help Generalization in Classification Ensembles? , 2019, IEEE Transactions on Cybernetics.

[10]  Marius Lindauer,et al.  Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.

[11]  Lingxi Xie,et al.  PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search , 2019, ICLR.

[12]  Jonas Mueller,et al.  Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles , 2019, AAAI.

[13]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[14]  Thomas B. Schön,et al.  Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2019, ICLR.

[17]  Hanna Mazzawi,et al.  Improving Neural Architecture Search Image Classifiers via Ensemble Learning , 2019, ArXiv.

[18]  Martin Jaggi,et al.  Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[19]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[20]  Gavin Brown,et al.  Joint Training of Neural Network Ensembles , 2019, ArXiv.

[21]  Mohamed H. Zaki,et al.  Uncertainty in Neural Networks: Approximately Bayesian Ensembling , 2018, AISTATS.

[22]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[23]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[24]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[25]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[26]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[27]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[28]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[29]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[31]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[32]  Bin Fu,et al.  Generalized Ambiguity Decompositions for Classification with Applications in Active Learning and Unsupervised Ensemble Pruning , 2017, AAAI.

[33]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[34]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[35]  Aaron Klein,et al.  Towards Automatically-Tuned Neural Networks , 2016, AutoML@ICML.

[36]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[37]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[38]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[39]  Mehryar Mohri,et al.  AdaNet: Adaptive Structural Learning of Artificial Neural Networks , 2016, ICML.

[40]  Christian Gagné,et al.  Bayesian Hyperparameter Optimization for Ensemble Learning , 2016, UAI.

[41]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[44]  Michael Cogswell,et al.  Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks , 2015, ArXiv.

[45]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[46]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[47]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[48]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[49]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[51]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[52]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[53]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[54]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[55]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[56]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[57]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[58]  Jeff A. Bilmes,et al.  Diverse Ensemble Evolution: Curriculum Data-Model Marriage , 2018, NeurIPS.

[59]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[60]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[61]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[62]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.