A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function

Recent researches on feature selection have been conducted in an attempt to find efficient methods for automatic selection of relevant features. The idea is to select a subset of attributes which are as representative as possible of the original data. Committees of classifiers, also known as ensemble systems, are composed of individual classifiers, organized in a parallel way and their output are combined in a combination method, which provides the final output of the system. In the context of these systems, feature selection methods can be used to provide different subsets of attributes for the individual classifiers, aiming to reduce redundancy among the attributes of a pattern and to increase the diversity in such systems. There are several methods to select features in ensembles systems and genetic algorithms (GA) is one of the most used methods. The main problem of using GA is the choice of the fitness function since the use of the ensemble accuracy means a complex and time consuming process and filter approaches may not reflect the real meaning of the solution. In this paper, we use feature selection via genetic algorithm to generate different subsets for the individual classifiers. In our proposal, we will used a hybrid and adaptive fitness function, in which we consider both approaches, filter and wrapper. In order to evaluate our proposal, experiments were conducted involving 10 different types of machine learning algorithms on 14 datasets. We will analyse the performance results of the proposed model compared with a genetic algorithm using a filter approach as well as the standard Bagging algorithm without feature selection.

[1]  M. Bacauskienea,et al.  A feature selection technique for generation of classification committees and its application to categorization of laryngeal images , 2013 .

[2]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[3]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[6]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[7]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[8]  Karim Faez,et al.  An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system , 2008, Appl. Math. Comput..

[9]  André L. V. Coelho,et al.  On the evolutionary design of heterogeneous Bagging models , 2010, Neurocomputing.

[10]  Peter Y. Chen,et al.  Correlation: Parametric and Nonparametric Measures , 2002 .

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[13]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[14]  Anne M. P. Canuto,et al.  A comparative analysis of genetic algorithm and ant colony optimization to select attributes for an heterogeneous ensemble of classifiers , 2010, IEEE Congress on Evolutionary Computation.

[15]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[16]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[17]  André L. V. Coelho,et al.  Ensembling Heterogeneous Learning Models with Boosting , 2009, ICONIP.

[18]  Francisco Herrera,et al.  A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection , 2009, HAIS.

[19]  Anne M. P. Canuto,et al.  Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles , 2007, Pattern Recognit. Lett..