Feature Subset Selection using Cascaded GA and CFS: A Filter Approach in Supervised Learning

edical data mining has enormous potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized by the physicians to improve clinical diagnosis. Feature subset selection is one of data preprocessing step, which is of immense importance in the field of data mining. As a part of feature subset selection step of data preprocessing, a filter approach with genetic algorithm (GA) and Correlation based feature selection has been used in a cascaded fashion. GA rendered global search of attributes with fitness evaluation effected by CFS. Experimental results signify that the feature subset recognized by the proposed filter GA+CFS, when given as input to five classifiers, namely decision tree, Naive Bayes, Bayesian, Radial basis function and k-nearest neighbor classifiers showed enhanced classification accuracy. Experiments have been carried out on four medical data sets publicly available at UCI.

[1]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[2]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[3]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[4]  B. Raman,et al.  Instance Based Filter for Feature Selection , 2002 .

[5]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[6]  Joseph L. Breault,et al.  Data Mining Diabetic Databases: Are Rough Sets a Useful Addition? , 2001 .

[7]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[8]  Asha Gowda Karegowda,et al.  Cascading GA & CFS for Feature Subset selection in Medical Data Mining , 2009, 2009 IEEE International Advance Computing Conference.

[9]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[10]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[13]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  José Manuel Benítez,et al.  C-FOCUS: A continuous extension of FOCUS , 2003 .

[16]  Nur Izura Udzir,et al.  A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music , 2008, ISMIR.

[17]  Zhang Li Feature Selection in Machine Learning , 2004 .

[18]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..