META-DES.Oracle: Meta-learning and feature selection for dynamic ensemble selection

Abstract Dynamic ensemble selection (DES) techniques work by estimating the competence level of each classifier from a pool of classifiers, and selecting only the most competent ones for the classification of a specific test sample. The key issue in DES is defining a suitable criterion for calculating the classifiers’ competence. There are several criteria available to measure the level of competence of base classifiers, such as local accuracy estimates and ranking. However, using only one criterion may lead to a poor estimation of the classifier’s competence. In order to deal with this issue, we have proposed a novel dynamic ensemble selection framework using meta-learning, called META-DES. A meta-classifier is trained, based on the meta-features extracted from the training data, to estimate the level of competence of a classifier for the classification of a given query sample. An important aspect of the META-DES framework is that multiple criteria can be embedded in the system encoded as different sets of meta-features. However, some DES criteria are not suitable for every classification problem. For instance, local accuracy estimates may produce poor results when there is a high degree of overlap between the classes. Moreover, a higher classification accuracy can be obtained if the performance of the meta-classifier is optimized for the corresponding data. In this paper, we propose a novel version of the META-DES framework based on the formal definition of the Oracle, called META-DES.Oracle. The Oracle is an abstract method that represents an ideal classifier selection scheme. A meta-feature selection scheme using an overfitting cautious Binary Particle Swarm Optimization (BPSO) is proposed for improving the performance of the meta-classifier. The difference between the outputs obtained by the meta-classifier and those presented by the Oracle is minimized. Thus, the meta-classifier is expected to obtain results that are similar to the Oracle. Experiments carried out using 30 classification problems demonstrate that the optimization procedure based on the Oracle definition leads to a significant improvement in classification accuracy when compared to previous versions of the META-DES framework and other state-of-the-art DES techniques.

[1]  Marek Kurzynski,et al.  On two measures of classifier competence for dynamic ensemble selection - experimental comparative analysis , 2010, 2010 10th International Symposium on Communications and Information Technologies.

[2]  Luiz Eduardo Soares de Oliveira,et al.  Combining overall and local class accuracies in an oracle-based method for dynamic ensemble selection , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[3]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[5]  George D. C. Cavalcanti,et al.  On Meta-learning for Dynamic Ensemble Selection , 2014, 2014 22nd International Conference on Pattern Recognition.

[6]  Thomas Stützle,et al.  Ant Colony Optimization , 2009, EMO.

[7]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[8]  Luiz Eduardo Soares de Oliveira,et al.  Dynamic selection of classifiers - A comprehensive review , 2014, Pattern Recognit..

[9]  Bartosz Krawczyk,et al.  Dynamic classifier selection for one-class classification , 2016, Knowl. Based Syst..

[10]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[11]  Marek Kurzynski,et al.  A measure of competence based on random classification for dynamic ensemble selection , 2012, Inf. Fusion.

[12]  Xiaoyi Jiang,et al.  Dynamic classifier ensemble model for customer classification with imbalanced class distribution , 2012, Expert Syst. Appl..

[13]  Basilio Sierra,et al.  Dynamic selection of the best base classifier in One versus One , 2015, Knowl. Based Syst..

[14]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[15]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[16]  Robert Sabourin,et al.  Dynamic selection approaches for multiple classifier systems , 2011, Neural Computing and Applications.

[17]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[18]  Amar Mitiche,et al.  Classifier combination for hand-printed digit recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[19]  Paul C. Smits,et al.  Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection , 2002, IEEE Trans. Geosci. Remote. Sens..

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[21]  Gian Luca Marcialis,et al.  A study on the performances of dynamic classifier selection based on local accuracy estimation , 2005, Pattern Recognit..

[22]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[23]  Michael I. Jordan,et al.  Local linear perceptrons for classification , 1996, IEEE Trans. Neural Networks.

[24]  Marek Kurzynski,et al.  Dynamic selection of classifiers ensemble applied to the recognition of EMG signal for the control of bioprosthetic hand , 2011, 2011 11th International Conference on Control, Automation and Systems.

[25]  Robert Sabourin,et al.  An Evaluation of Over-Fit Control Strategies for Multi-Objective Evolutionary Optimization , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[26]  R. Storn,et al.  Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series) , 2005 .

[27]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Juan José Rodríguez Diez,et al.  Classifier Ensembles with a Random Linear Oracle , 2007, IEEE Transactions on Knowledge and Data Engineering.

[30]  Bogdan Gabrys,et al.  Classifier selection for majority voting , 2005, Inf. Fusion.

[31]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[32]  William Nick Street,et al.  Distant diversity in dynamic class prediction , 2018, Ann. Oper. Res..

[33]  George D. C. Cavalcanti,et al.  A DEEP analysis of the META-DES framework for dynamic selection of ensemble of classifiers , 2015, ArXiv.

[34]  Robert Sabourin,et al.  Overfitting cautious selection of classifier ensembles with genetic algorithms , 2009, Inf. Fusion.

[35]  Lior Rokach,et al.  Decision forest: Twenty years of research , 2016, Inf. Fusion.

[36]  Michal Wozniak,et al.  Hybrid Classifiers - Methods of Data, Knowledge, and Classifier Combination , 2013, Studies in Computational Intelligence.

[37]  Li-Yeh Chuang,et al.  Improved binary particle swarm optimization using catfish effect for feature selection , 2011, Expert Syst. Appl..

[38]  Francisco Herrera,et al.  DRCW-OVO: Distance-based relative competence weighting combination for One-vs-One strategy in multi-class problems , 2015, Pattern Recognit..

[39]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  George D. C. Cavalcanti,et al.  Prototype selection for dynamic classifier and ensemble selection , 2016, Neural Computing and Applications.

[41]  David W. Corne,et al.  No Free Lunch and Free Leftovers Theorems for Multiobjective Optimisation Problems , 2003, EMO.

[42]  Erik D. Goodman,et al.  Swarmed feature selection , 2004, 33rd Applied Imagery Pattern Recognition Workshop (AIPR'04).

[43]  Anne M. P. Canuto,et al.  Using Accuracy and Diversity to Select Classifiers to Build Ensembles , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[44]  George D. C. Cavalcanti,et al.  META-DES.H: A Dynamic Ensemble Selection technique using meta-learning and a dynamic weighting approach , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[45]  Robert Sabourin,et al.  Dynamic selection of generative-discriminative ensembles for off-line signature verification , 2012, Pattern Recognit..

[46]  Robert Sabourin,et al.  Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations , 2017, Machine Vision and Applications.

[47]  George D. C. Cavalcanti,et al.  META-DES: A dynamic ensemble selection framework using meta-learning , 2015, Pattern Recognit..

[48]  Bartlomiej Antosik,et al.  New Measures of Classifier Competence - Heuristics and Application to the Design of Multiple Classifier Systems , 2011, Computer Recognition Systems 4.

[49]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[50]  R. Sabourin,et al.  Factors of overtraining with fuzzy ARTMAP neural networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[51]  Robert Sabourin,et al.  LoGID: An adaptive framework combining local and global incremental learning for dynamic selection of ensembles of HMMs , 2012, Pattern Recognit..

[52]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[53]  Luiz Eduardo Soares de Oliveira,et al.  Contribution of data complexity features on dynamic classifier selection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[54]  Luiz Eduardo Soares de Oliveira,et al.  Overfitting in the selection of classifier ensembles: a comparative study between PSO and GA , 2008, GECCO '08.

[55]  Rami N. Khushaba,et al.  Feature subset selection using differential evolution and a wheel based search strategy , 2013, Swarm Evol. Comput..

[56]  Robert Sabourin,et al.  A dynamic overproduce-and-choose strategy for the selection of classifier ensembles , 2008, Pattern Recognit..

[57]  Marek Kurzynski,et al.  A Measure of Competence Based on Randomized Reference Classifier for Dynamic Ensemble Selection , 2010, 2010 20th International Conference on Pattern Recognition.

[58]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Michal Wozniak,et al.  Designing Fusers on the Basis of Discriminants - Evolutionary and Neural Methods of Training , 2010, HAIS.

[60]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[61]  Fabio Roli,et al.  Dynamic classifier selection based on multiple classifier behaviour , 2001, Pattern Recognit..

[62]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[63]  Andrew Lewis,et al.  S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization , 2013, Swarm Evol. Comput..

[64]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[65]  George D. C. Cavalcanti,et al.  A method for dynamic ensemble selection based on a filter and an adaptive distance to improve the quality of the regions of competence , 2011, IJCNN.

[66]  Giorgio Valentini,et al.  An experimental bias-variance analysis of SVM ensembles based on resampling techniques , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[67]  Adel Al-Jumaily,et al.  Feature subset selection using differential evolution and a statistical repair mechanism , 2011, Expert Syst. Appl..

[68]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[69]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  George D. C. Cavalcanti,et al.  Analyzing Dynamic Ensemble Selection Techniques Using Dissimilarity Analysis , 2014, ANNPR.

[71]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[72]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[74]  Marek Kurzynski,et al.  A probabilistic model of classifier competence for dynamic ensemble selection , 2011, Pattern Recognit..

[75]  Marek Kurzynski,et al.  On a New Measure of Classifier Competence Applied to the Design of Multiclassifier Systems , 2009, ICIAP.

[76]  Francisco Herrera,et al.  Dynamic classifier selection for One-vs-One strategy: Avoiding non-competent classifiers , 2013, Pattern Recognit..

[77]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[78]  Jingqi Fu,et al.  A Novel Probability Binary Particle Swarm Optimization Algorithm and Its Application , 2008, J. Softw..