Two Simple and Effective Feature Selection Methods for Continuous Attributes with Discrete Multi-class

We present two feature selection methods, inspired in the Shannon's entropy and the Information Gain measures, that are easy to implement. These methods apply when we have a database with continuous attributes and discrete multi- class. The first method applies when attributes are independent among them given the class. The second method is useful when we suspect that interdependencies among the attributes exist. In the experiments that we realized, with synthetic and real databases, the proposed methods are shown to be fast and to produce near optimum solutions, with a good feature reduction ratio.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[3]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[4]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[5]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Selwyn Piramuthu,et al.  Artificial Intelligence and Information Technology Evaluating feature selection methods for learning in data mining applications , 2004 .

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Petra Perner,et al.  Empirical Evaluation of Feature Subset Selection Based on a Real-World Data Set , 2000, PKDD.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[12]  Jesús S. Aguilar-Ruiz,et al.  SOAP: Efficient Feature Selection of Numeric Attributes , 2002, IBERAMIA.

[13]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Erik G. Miller A new class of entropy estimators for multi-dimensional densities , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[18]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .