Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers

The number of component classifiers chosen for an ensemble greatly impacts the prediction ability. In this paper, we use a geometric framework for a priori determining the ensemble size, which is applicable to most of the existing batch and online ensemble classifiers. There are only a limited number of studies on the ensemble size examining majority voting (MV) and weighted MV (WMV). Almost all of them are designed for batch-mode, hardly addressing online environments. Big data dimensions and resource limitations, in terms of time and memory, make the determination of ensemble size crucial, especially for online environments. For the MV aggregation rule, our framework proves that the more strong components we add to the ensemble, the more accurate predictions we can achieve. For the WMV aggregation rule, our framework proves the existence of an ideal number of components, which is equal to the number of class labels, with the premise that components are completely independent of each other and strong enough. While giving the exact definition for a strong and independent classifier in the context of an ensemble is a challenging task, our proposed geometric framework provides a theoretical explanation of diversity and its impact on the accuracy of predictions. We conduct a series of experimental evaluations to show the practical value of our theorems and existing challenges.

[1]  Lior Rokach,et al.  Collective-agreement-based pruning of ensembles , 2009, Comput. Stat. Data Anal..

[2]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[3]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[4]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[5]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[6]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[7]  Shengli Wu,et al.  A geometric framework for data fusion in information retrieval , 2015, Inf. Syst..

[8]  Daniel Hernández-Lobato,et al.  How large should ensembles of classifiers be? , 2013, Pattern Recognit..

[9]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[11]  Terry Windeatt,et al.  Ensemble Pruning Using Spectral Coefficients , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[13]  Xin Yao,et al.  Resampling-Based Ensemble Methods for Online Class Imbalance Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[15]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[16]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[17]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[18]  David J. Miller,et al.  Critic-driven ensemble classification , 1999, IEEE Trans. Signal Process..

[19]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[20]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[21]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Jean Paul Barddal,et al.  A Survey on Ensemble Learning for Data Stream Classification , 2017, ACM Comput. Surv..

[23]  Fazli Can,et al.  Incremental clustering for dynamic information processing , 1993, TOIS.

[24]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[25]  Xiaohua Hu,et al.  Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[26]  Hamed R. Bonab,et al.  A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams , 2016, CIKM.

[27]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[28]  Hamed R. Bonab,et al.  GOOWE , 2017, ACM Trans. Knowl. Discov. Data.

[29]  Fabio Roli,et al.  A Theoretical Analysis of Bagging as a Linear Combination of Classifiers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[31]  Lai-Wan Chan Weighted least square ensemble networks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[32]  Fazli Can,et al.  Squeezing the Ensemble Pruning: Faster and More Accurate Categorization for News Portals , 2012, ECIR.

[33]  Hamed R. Bonab,et al.  A Novel Online Stacked Ensemble for Multi-Label Stream Classification , 2018, CIKM.

[34]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[35]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[36]  Per Christian Hansen,et al.  Least Squares Data Fitting with Applications , 2012 .

[37]  Huan Liu,et al.  An Empirical Study of Building Compact Ensembles , 2004, WAIM.

[38]  Piotr Duda,et al.  How to adjust an ensemble size in stream data mining? , 2017, Inf. Sci..

[39]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[40]  Ernestina Menasalvas Ruiz,et al.  Mining Recurring Concepts in a Dynamic Feature Space , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[42]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[43]  Ethem Alpaydin,et al.  Incremental construction of classifier and discriminant ensembles , 2009, Inf. Sci..

[44]  Konrad Jackowski New diversity measure for data stream classification ensembles , 2018, Eng. Appl. Artif. Intell..

[45]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[46]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[47]  Piotr Duda,et al.  A method for automatic adjustment of ensemble size in stream data mining , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[48]  Olivier Debeir,et al.  Limiting the Number of Trees in Random Forests , 2001, Multiple Classifier Systems.

[49]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[50]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[51]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.