论文信息 - Greedy Bayesian Posterior Approximation with Deep Ensembles

Greedy Bayesian Posterior Approximation with Deep Ensembles

Ensembles of independently trained neural networks are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning, and can be interpreted as an approximation of the posterior distribution via a mixture of delta functions. The training of ensembles relies on non-convexity of the loss landscape and random initialization of their individual members, making the resulting posterior approximation uncontrolled. This paper proposes a novel and principled method to tackle this limitation, minimizing an f -divergence between the true posterior and a kernel density estimator in a function space. We analyze this objective from a combinatorial point of view, and show that it is submodular with respect to mixture components for any f . Subsequently, we consider the problem of greedy ensemble construction, and from the marginal gain of the total objective, we derive a novel diversity term for ensemble methods. The performance of our approach is demonstrated on computer vision out-of-distribution benchmarks in a range of architectures trained on multiple datasets. The source code of our method is publicly available at https://github.com/MIPT-Oulu/greedy_ensembles_training.

Matthew B. Blaschko | Aleksei Tiulpin | A. Tiulpin

[1] Matthew B. Blaschko,et al. Function Norms for Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[2] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Jeremy Nixon,et al. Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[5] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[6] Hai Li,et al. DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles , 2020, NeurIPS.

[7] Dilin Wang,et al. Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models , 2019, ICML.

[8] Andrew Gordon Wilson,et al. Subspace Inference for Bayesian Deep Learning , 2019, UAI.

[9] Tim Pearce,et al. Uncertainty in Neural Networks: Approximately Bayesian Ensembling , 2018, AISTATS.

[10] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[11] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.

[12] N. Kazarinoff. Analytic Inequalities , 2021, Inequalities in Analysis and Probability.

[13] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[14] Jan Vondrák,et al. Submodular maximization by simulated annealing , 2010, SODA '11.

[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16] Sebastian Nowozin,et al. Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[17] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[18] Francis Bach,et al. Submodular functions: from discrete to continuous domains , 2015, Mathematical Programming.

[19] Arno Solin,et al. Stationary Activations for Uncertainty Calibration in Deep Learning , 2020, NeurIPS.

[20] Andrew Gordon Wilson,et al. A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[21] Thomas G. Dietterich,et al. Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[22] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[23] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[24] Oleksandr Makeyev,et al. Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[25] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[26] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[27] Joseph Naor,et al. Submodular Maximization with Cardinality Constraints , 2014, SODA.

[28] Dmitry Vetrov,et al. Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning , 2020, ICLR.

[29] Fei Sha,et al. Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation , 2020, ArXiv.

[30] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[31] Aoying Zhou,et al. Ensemble Pruning: A Submodular Function Maximization Perspective , 2014, DASFAA.

[32] Lisa Fleischer,et al. Submodular Approximation: Sampling-based Algorithms and Lower Bounds , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[33] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .

[34] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[35] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[36] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[37] Dustin Tran,et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.

[38] Andrew Gordon Wilson,et al. Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.

[39] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Robert E. Schapire,et al. A Brief Introduction to Boosting , 1999, IJCAI.

[42] Andrey Malinin,et al. Ensemble Distribution Distillation , 2019, ICLR.

[43] Finale Doshi-Velez,et al. Ensembles of Locally Independent Prediction Models , 2020, AAAI.

[44] Masashi Sugiyama,et al. Bayesian Posterior Approximation via Greedy Particle Optimization , 2018, AAAI.

[45] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[46] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[47] Nicholay Topin,et al. Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[48] Yee Whye Teh,et al. Neural Ensemble Search for Performant and Calibrated Predictions , 2020, ArXiv.

[49] Peter Tiño,et al. Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[50] Kunihiko Fukushima,et al. Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[51] Joost R. van Amersfoort,et al. Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML 2020.

[52] Matthieu Cord,et al. DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation , 2021, ICLR.

[53] Mark J. F. Gales,et al. Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[54] Raymond J. Mooney,et al. Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[55] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Ludmila I. Kuncheva,et al. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[57] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[58] Finale Doshi-Velez,et al. Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning , 2017, ICML.

[59] Naira Hovakimyan,et al. f-Divergence Variational Inference , 2020, NeurIPS.

[60] Ya Le,et al. Tiny ImageNet Visual Recognition Challenge , 2015 .