When does Diversity Help Generalization in Classification Ensembles?

Ensembles, as a widely used and effective technique in the machine learning community, succeed within a key element--"diversity." The relationship between diversity and generalization, unfortunately, is not entirely understood and remains an open research issue. To reveal the effect of diversity on the generalization of classification ensembles, we investigate three issues on diversity, i.e., the measurement of diversity, the relationship between the proposed diversity and generalization error, and the utilization of this relationship for ensemble pruning. In the diversity measurement, we measure diversity by error decomposition inspired by regression ensembles, which decomposes the error of classification ensembles into accuracy and diversity. Then we formulate the relationship between the measured diversity and ensemble performance through the theorem of margin and generalization, and observe that the generalization error is reduced effectively only when the measured diversity is increased in a few specific ranges, while in other ranges larger diversity is less beneficial to increase generalization of an ensemble. Besides, we propose a pruning method based on diversity management to utilize this relationship, which could increase diversity appropriately and shrink the size of the ensemble with non-decreasing performance. The experiments validate the effectiveness of this proposed relationship between the proposed diversity and the ensemble generalization error.

[1]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[2]  Yaochu Jin,et al.  Multi‐objective ensemble generation , 2015, WIREs Data Mining Knowl. Discov..

[3]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[4]  Ludmila I. Kuncheva,et al.  Relationships between combination methods and measures of diversity in combining classifiers , 2002, Inf. Fusion.

[5]  Yang Yu,et al.  Pareto Ensemble Pruning , 2015, AAAI.

[6]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[7]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[9]  Francisco Charte,et al.  Ensemble-Based Classifiers , 2016 .

[10]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[11]  Yang Yu,et al.  Diversity Regularized Machine , 2011, IJCAI.

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Zhi-Hua Zhou,et al.  Multi-information Ensemble Diversity , 2010, MCS.

[15]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[18]  R. Sokal,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification. , 1975 .

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[21]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[22]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[23]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[24]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Fabio Roli,et al.  Design of effective neural network ensembles for image classification purposes , 2001, Image Vis. Comput..

[26]  Lei Xie,et al.  An ensemble of deep neural networks for object tracking , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[27]  Yang Yu,et al.  Diversity Regularized Ensemble Pruning , 2012, ECML/PKDD.

[28]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[29]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[30]  Djamel Bouchaffra,et al.  An efficient ensemble pruning approach based on simple coalitional games , 2017, Inf. Fusion.

[31]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[32]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[33]  Hamid Zarrabi-Zadeh,et al.  Diversity Maximization via Composable Coresets , 2015, CCCG.

[34]  Huanhuan Chen,et al.  Multiobjective Neural Network Ensembles Based on Regularized Negative Correlation Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[37]  Yaochu Jin,et al.  Generating diverse and accurate classifier ensembles using multi-objective optimization , 2014, 2014 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making (MCDM).

[38]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[39]  Xindong Wu,et al.  Ensemble pruning via individual contribution ordering , 2010, KDD.

[40]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[41]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[43]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[45]  Derek Partridge,et al.  Software Diversity: Practical Statistics for Its Measurement and Exploitation | Draft Currently under Revision , 1996 .

[46]  Bin Fu,et al.  Generalized Ambiguity Decompositions for Classification with Applications in Active Learning and Unsupervised Ensemble Pruning , 2017, AAAI.

[47]  David B. Skalak,et al.  The Sources of Increased Accuracy for Two Proposed Boosting Algorithms , 1996, AAAI 1996.

[48]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[49]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[50]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[51]  Chuji Tsuboi,et al.  The First and Second Vertical Derivatives of Gravity. , 1952 .

[52]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[53]  Xin Yao,et al.  DIVACE: Diverse and Accurate Ensemble Learning Algorithm , 2004, IDEAL.

[54]  Thore Graepel,et al.  A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[55]  Gavin Brown An Information Theoretic Perspective on Multiple Classifier Systems , 2009, MCS.