Wisdom of the Ensemble: Improving Consistency of Deep Learning Models

Deep learning classifiers are assisting humans in making decisions and hence the user's trust in these models is of paramount importance. Trust is often a function of constant behavior. From an AI model perspective it means given the same input the user would expect the same output, especially for correct outputs, or in other words consistently correct outputs. This paper studies a model behavior in the context of periodic retraining of deployed models where the outputs from successive generations of the models might not agree on the correct labels assigned to the same input. We formally define consistency and correct-consistency of a learning model. We prove that consistency and correct-consistency of an ensemble learner is not less than the average consistency and correct-consistency of individual learners and correct-consistency can be improved with a probability by combining learners with accuracy not less than the average accuracy of ensemble component learners. To validate the theory using three datasets and two state-of-the-art deep learning classifiers we also propose an efficient dynamic snapshot ensemble method and demonstrate its value.

[1]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[2]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[3]  Galit Shmueli,et al.  Clarifying the terminology that describes scientific reproducibility , 2015, Nature Methods.

[4]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[5]  Geoffrey E. Hinton,et al.  Large scale distributed neural network training through online distillation , 2018, ICLR.

[6]  Dazhong Wu,et al.  Deep learning for smart manufacturing: Methods and applications , 2018, Journal of Manufacturing Systems.

[7]  Remco C. Veltkamp,et al.  An Ensemble of Deep Support Vector Machines for Image Categorization , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[8]  Sumit Gulwani,et al.  Data Invariants: On Trust in Data-Driven Systems , 2020, ArXiv.

[9]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[10]  Prasad Patil,et al.  Training replicable predictors in multiple studies , 2018, Proceedings of the National Academy of Sciences.

[11]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[12]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[13]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[14]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[15]  Pearl Brereton,et al.  Reproducibility in Machine Learning-Based Studies: An Example of Text Mining , 2017 .

[16]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[17]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[20]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[21]  Piotr Duda,et al.  How to adjust an ensemble size in stream data mining? , 2017, Inf. Sci..

[22]  Chuang Zhang,et al.  Horizontal and Vertical Ensemble with Deep Representation for Classification , 2013, ArXiv.

[23]  Michael Cogswell,et al.  Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks , 2015, ArXiv.

[24]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[27]  Honglak Lee,et al.  Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising , 2013, NIPS.

[28]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[29]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[30]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[31]  Hamed R. Bonab,et al.  Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Daniel Oliveira,et al.  Detecting Driver’s Fatigue, Distraction and Activity Using a Non-Intrusive Ai-Based Monitoring System , 2019, J. Artif. Intell. Soft Comput. Res..

[33]  Justin D. Weisz,et al.  In AI We Trust? Factors That Influence Trustworthiness of AI-infused Decision-Making Processes , 2019, ArXiv.

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[35]  Peter Henderson,et al.  Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[36]  Matt Crane,et al.  Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results , 2018, TACL.

[37]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.