Distant diversity in dynamic class prediction

Instead of using the same ensemble for all data instances, recent studies have focused on dynamic ensembles in which a new ensemble is chosen from a pool of classifiers for each new data instance. Classifiers agreement in the region where a new data instance resides in has been considered as a major factor in dynamic ensembles. We postulate that the classifiers chosen for a dynamic ensemble should behave similarly in the region in which the new instance resides, but differently outside of this area. In other words, we hypothesize that high local accuracy, combined with high diversity in other regions, is desirable. To verify the validity of this hypothesis we propose two approaches. The first approach focuses on finding the k-nearest data instances to the new instance, which then defines a neighborhood, and maximizes simultaneously local accuracy and distant diversity, based on data instances outside of the neighborhood. The second method makes use of an alternative definition of the neighborhood: all data instances are in the neighborhood. However, the importance of data instances for accuracy and diversity depends on the distance to the new instance. We demonstrate through several experiments that the distance-based diversity and accuracy outperform all benchmark methods.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Jean-Loup Faulon,et al.  Disparate data fusion for protein phosphorylation prediction , 2010, Ann. Oper. Res..

[3]  Robert Sabourin,et al.  Dynamic Selection of Ensembles of Classifiers Using Contextual Information , 2010, MCS.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[6]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[7]  Gian Luca Marcialis,et al.  A study on the performances of dynamic classifier selection based on local accuracy estimation , 2005, Pattern Recognit..

[8]  Fabio Roli,et al.  Dynamic classifier selection based on multiple classifier behaviour , 2001, Pattern Recognit..

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Jaideep Srivastava,et al.  Relationship between Diversity and Correlation in Multi-Classifier Systems , 2010, PAKDD.

[13]  Luca Didaci,et al.  Dynamic Classifier Selection by Adaptive k-Nearest-Neighbourhood Rule , 2004, Multiple Classifier Systems.

[14]  Hyunchul Ahn,et al.  Using genetic algorithms to optimize nearest neighbors for data mining , 2008, Ann. Oper. Res..

[15]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[16]  William Nick Street,et al.  Dynamic Class Prediction with Classifier Based Distance Measure , 2014, AusDM.

[17]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[18]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[19]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Robert Sabourin,et al.  A dynamic overproduce-and-choose strategy for the selection of classifier ensembles , 2008, Pattern Recognit..

[21]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[22]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..