A filter model for feature subset selection based on genetic algorithm

This paper describes a novel feature subset selection algorithm, which utilizes a genetic algorithm (GA) to optimize the output nodes of trained artificial neural network (ANN). The new algorithm does not depend on the ANN training algorithms or modify the training results. The two groups of weights between input-hidden and hidden-output layers are extracted after training the ANN on a given database. The general formula for each output node (class) of ANN is then generated. This formula depends only on input features because the two groups of weights are constant. This dependency is represented by a non-linear exponential function. The GA is involved to find the optimal relevant features, which maximize the output function for each class. The dominant features in all classes are the features subset to be selected from the input feature group.

[1]  Steven Guan,et al.  Feature selection for modular GA-based classification , 2004, Appl. Soft Comput..

[2]  Yuehua Wu,et al.  Linear model selection by cross-validation , 2005 .

[3]  Fillia Makedon,et al.  Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data , 2004 .

[4]  Xindong Wu,et al.  Induction By Attribute Elimination , 1999, IEEE Trans. Knowl. Data Eng..

[5]  Francisco José Madrid-Cuevas,et al.  Characterization of empirical discrepancy evaluation measures , 2004, Pattern Recognit. Lett..

[6]  Jaekyung Yang,et al.  Optimization-based feature selection with adaptive instance sampling , 2006, Comput. Oper. Res..

[7]  Kevin Lü,et al.  A preprocess algorithm of filtering irrelevant information based on the minimum class difference , 2006, Knowl. Based Syst..

[8]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[9]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[10]  Salvatore Ruggieri,et al.  Efficient C4.5 , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Jiye Li,et al.  Introducing a Rule Importance Measure , 2006, Trans. Rough Sets.

[12]  Riyaz Sikora,et al.  Framework for efficient feature selection in genetic algorithm based data mining , 2007, Eur. J. Oper. Res..

[13]  Bovas Abraham,et al.  Dimensionality reduction approach to multivariate prediction , 2005, Comput. Stat. Data Anal..

[14]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[15]  Harinder Sawhney,et al.  A feed-forward artificial neural network with enhanced feature selection for power system transient stability assessment , 2006 .

[16]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[17]  Marco Muselli,et al.  Hamming Clustering: A New Approach to Rule Extraction , 1999, IIA/SOCO.

[18]  George D. Smith,et al.  Evolutionary Feature Construction Using Information Gain and Gini Index , 2004, EuroGP.

[19]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[20]  Ahmed Bouridane,et al.  Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier , 2007, Pattern Recognit. Lett..

[21]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[22]  Andrzej Skowron,et al.  Transactions on Rough Sets V , 2006, Trans. Rough Sets.