Diversity in Combinations of Heterogeneous Classifiers

In this paper, we introduce the use of combinations of heterogeneous classifiers to achieve better diversity. Conducting theoretical and empirical analyses of the diversity of combinations of heterogeneous classifiers, we study the relationship between heterogeneity and diversity. On the one hand, the theoretical analysis serves as a foundation for employing heterogeneous classifiers in Multi-Classifier Systems or ensembles. On the other hand, experimental results provide empirical evidence. We consider synthetic as well as real data sets, utilize classification algorithms that are essentially different, and employ various popular diversity measures for evaluation. Two interesting observations will contribute to the future design of Multi-Classifier Systems and ensemble techniques. First, the diversity among heterogeneous classifiers is higher than that among homogeneous ones, and hence using heterogeneous classifiers to construct classifier combinations would increase the diversity. Second, the heterogeneity primarily results from different classification algorithms rather than the same algorithm with different parameters.

[1]  C. J. Whitaker,et al.  Ten measures of diversity in classifier ensembles: limits for two classifiers , 2001 .

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Ludmila I. Kuncheva,et al.  That Elusive Diversity in Classifier Ensembles , 2003, IbPRIA.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Lawrence O. Hall,et al.  A New Ensemble Diversity Measure Applied to Thinning Ensembles , 2003, Multiple Classifier Systems.

[8]  Armando Freitas da Rocha,et al.  Neural Nets , 1992, Lecture Notes in Computer Science.

[9]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[10]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[13]  Joydeep Ghosh,et al.  Multiclassifier Systems: Back to the Future , 2002, Multiple Classifier Systems.

[14]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[15]  Robert P. W. Duin,et al.  An experimental study on diversity for bagging and boosting with linear classifiers , 2002, Inf. Fusion.

[16]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[17]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[18]  Josef Kittler,et al.  Multiple expert system design by combined feature selection and probability level fusion , 2000, Proceedings of the Third International Conference on Information Fusion.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Robert P. W. Duin,et al.  Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy , 2002, Multiple Classifier Systems.

[21]  Vasile Palade,et al.  Multi-Classifier Systems: Review and a roadmap for developers , 2006, Int. J. Hybrid Intell. Syst..

[22]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[23]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[24]  Ian Witten,et al.  Data Mining , 2000 .

[25]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[26]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.