Selective Ensembles for Consistent Predictions

Recent work has shown that models trained to the same objective, and which achieve similar measures of accuracy on consistent test data, may nonetheless behave very differently on individual predictions. This inconsistency is undesirable in high-stakes contexts, such as medical diagnosis and finance. We show that this inconsistentbehavior extends beyond predictions to feature attributions, which may likewise have negative implications for the intelligibility of a model, and one’s ability to find recourse for subjects. We then introduce selective ensembles to mitigate such inconsistencies by applying hypothesis testing to the predictions of a set of models trained using randomly-selected starting conditions; importantly, selective ensembles can abstain in cases where a consistent outcome cannot be achieved up to a specified confidence level. We prove that that prediction disagreement between selective ensembles is bounded, and empirically demonstrate that selective ensembles achieve consistent predictions and feature attributions while maintaining low abstention rates. On several benchmark datasets, selective ensembles reach zero inconsistently predicted points, with abstention rates as low 1.5%.

[1]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[2]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[3]  Jude W. Shavlik,et al.  Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks , 1995, IJCAI.

[4]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[5]  Klaus-Robert Müller,et al.  Fairwashing Explanations with Off-Manifold Detergent , 2020, ICML.

[6]  Pedro M. Domingos A Unified Bias-Variance Decomposition , 2022 .

[7]  Khaled Rasheed,et al.  Decision tree and ensemble learning algorithms with their applications in bioinformatics. , 2011, Advances in experimental medicine and biology.

[8]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[9]  Matt Fredrikson,et al.  Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference , 2019, USENIX Security Symposium.

[10]  Nathan Intrator,et al.  Optimal ensemble averaging of neural networks , 1997 .

[11]  M. Pontil Leave-one-out error and stability of learning algorithms with applications , 2002 .

[12]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[13]  Somesh Jha,et al.  Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  Josef Skrzypek,et al.  Synergy of Clustering Multiple Back Propagation Networks , 1989, NIPS.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[18]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[19]  Klaus-Robert Müller,et al.  Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.

[20]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[21]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[22]  Vitaly Feldman,et al.  Does learning require memorization? a short tale about a long tail , 2019, STOC.

[23]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[24]  Percy Liang,et al.  Selective Classification Can Magnify Disparities Across Groups , 2020, ICLR.

[25]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[26]  Chirag Gupta,et al.  Nested conformal prediction and quantile out-of-bag ensemble methods , 2019, Pattern Recognit..

[27]  Eklas Hossain,et al.  Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers , 2020, IEEE Access.

[28]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[29]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[30]  Berk Ustun,et al.  Predictive Multiplicity in Classification , 2020, ICML.

[31]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[32]  Francesco Bianconi,et al.  Collection of textures in colorectal cancer histology , 2016 .

[33]  Giorgio Valentini,et al.  Cancer recognition with bagged ensembles of support vector machines , 2004, Neurocomputing.

[34]  Stephen E. Fienberg,et al.  Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle , 2015, J. Mach. Learn. Res..

[35]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[36]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[40]  Ron Meir,et al.  Bias, Variance and the Combination of Least Squares Estimators , 1994, NIPS.

[41]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[42]  Henrik Boström,et al.  Effective utilization of data in inductive conformal prediction using ensembles of neural networks , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[43]  Henrik Boström,et al.  Efficient conformal predictor ensembles , 2020, Neurocomputing.

[44]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[45]  Cordelia Schmid,et al.  Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Taesup Moon,et al.  Fooling Neural Network Interpretations via Adversarial Model Manipulation , 2019, NeurIPS.

[47]  Ilaria Liccardi,et al.  Debugging Tests for Model Explanations , 2020, NeurIPS.

[48]  Matt Fredrikson,et al.  Leave-one-out Unfairness , 2021, FAccT.

[49]  Fabio Roli,et al.  Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation , 2005, Multiple Classifier Systems.

[50]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[51]  Matt Fredrikson,et al.  Influence-Directed Explanations for Deep Convolutional Networks , 2018, 2018 IEEE International Test Conference (ITC).

[52]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[53]  Nikolaus Kriegeskorte,et al.  Individual differences among deep neural network models , 2020, Nature Communications.

[54]  Jinyuan Jia,et al.  On the Intrinsic Differential Privacy of Bagging , 2020, IJCAI.

[55]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[56]  William Fithian,et al.  Rank verification for exponential families , 2016, The Annals of Statistics.

[57]  Francesco Bianconi,et al.  Multi-class texture analysis in colorectal cancer histology , 2016, Scientific Reports.