Feature selection algorithm for mixed data with both nominal and continuous features

Feature selection is a crucial step in pattern recognition. Most feature selection algorithms reported are developed for continuous features. In this paper, we propose a feature selection algorithm for mixed-typed data containing both continuous and nominal features. The algorithm consists of a novel criterion for mixed feature subset evaluation and a novel search algorithm for mixed feature subset generation. The proposed feature selection algorithm is tested using both artificial and real-world problems.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[4]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[5]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[6]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[7]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[8]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[9]  Jean-Jacques Daudin,et al.  Generalization of the Mahalanobis distance in the mixed case , 1995 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  R. Florez-Lopez,et al.  Reviewing RELIEF and its extensions: a new approach for estimating attributes considering high-correlated features , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[14]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[15]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .