Optimal Ensembles for Deep Learning Classification: Theory and Practice

Ensemble methods for classification problems construct a set of models, often called "learners", and then assign class labels to new data points by taking a combination of the predictions from these models. Ensemble methods are popular and used in a wide range of problem domains because of their good performance. However, a theoretical understanding of the optimality of ensembles is, in many instances, an open problem. In particular, improving the performance of an ensemble requires an understanding of the subtle interplay between the accuracy of the individual learners and the diversity of the learners in the ensemble. For example, if all of the learners in an ensemble were identical, then clearly the accuracy of the ensemble cannot be any better than the accuracy of the individual learning, no matter how many learners one were to use. Accordingly, here we develop a theory for understanding when ensembles are optimal, in an appropriate sense, by balancing individual accuracy against ensemble diversity, from the perspective of statistical correlations. The theory that we derive is applicable for many practical ensembles, and we provide a set of metrics for assessing the optimality of any given ensemble. Perhaps most interestingly, the metrics that we develop lead naturally to a set of novel loss functions that can be optimized using backpropagation giving rise to optimal deep neural network based ensembles. We demonstrate the effectiveness of these deep neural network based ensembles using standard benchmark data sets.

[1]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[2]  S. Berg Condorcet's jury theorem, dependency among jurors , 1993 .

[3]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[4]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[5]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[6]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[10]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[11]  Nuria Oliver,et al.  Data Mining Methods for Recommender Systems , 2015, Recommender Systems Handbook.

[12]  Noel E. Sharkey,et al.  Combining diverse neural nets , 1997, The Knowledge Engineering Review.

[13]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[14]  George T. Gilber Positive definite matrices and Sylvester's criterion , 1991 .

[15]  P. Boland Majority Systems and the Condorcet Jury Theorem , 1989 .

[16]  G. vanRossum Python reference manual , 1995 .

[17]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[18]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  K. Ladha The Condorcet Jury Theorem, Free Speech and Correlated Votes , 1992 .

[21]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[22]  Jie Gong,et al.  Predicting construction cost overruns using text mining, numerical data and ensemble classifiers , 2014 .

[23]  Hui Li,et al.  Research and development of neural network ensembles: a survey , 2018, Artificial Intelligence Review.

[24]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[25]  S. Kaniovski,et al.  Optimal jury design for homogeneous juries with correlated votes , 2011 .

[26]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Gavin Brown,et al.  Diversity in neural network ensembles , 2004 .