´ AAAAAAAAAAAAAAAAAAAAAAAA ´ AAAAAAAAAAAAAAAAAAAAAAAA ARTICULO

The aim of Feature Subset Selection FSS algorithms is to select a subset of features from the original set of features that describes a data set according to some importance criterion. To accomplish this task, FSS removes irrelevant and/or redundant features, as they may decrease data quality and reduce several of the desired properties of classifiers induced by supervised learning algorithms. As learning the best subset of features is an NP-hard problem, FSS algorithms generally use heuristics to select subsets. Therefore, it is important to empirically evaluate the performance of these algorithms. However, this evaluation needs to be multicriteria, i.e., it should take into account several properties. This work describes a simple model we have proposed to evaluate FSS algorithms which considers two properties, namely the predictive performance of the classifier induced using the subset of features selected by different FSS algorithms, as well as the reduction in the number of features. Another multicriteria performance evaluation model based on rankings, which makes it possible to consider any number of properties is also presented. The models are illustrated by their application to four well known FSS algorithms and two versions of a new FSS algorithm we have developed.

[1]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[2]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[3]  Alexander Schnabl,et al.  Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms , 1997, KDD.

[4]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[5]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[6]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[7]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  J. Zurada,et al.  NEW GENERATION OF DATA MINING APPLICATIONS , 2003 .

[10]  Huei Diana Lee Selection of important features for knowledge extraction from data bases , 2005 .

[11]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[12]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[14]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[15]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[16]  Miquel Barceló,et al.  Inteligencia Artificial , 2001 .

[17]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[18]  Ian Witten,et al.  Data Mining , 2000 .

[19]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[20]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.