Genetic algorithm with fuzzy fitness function for feature selection

Feature selection is an important preprocessing task for any pattern recognition or data mining application. Though lots of well developed statistical and mathematical techniques of feature selection exist they do not match the imprecise and incomplete nature of most of the real world problems. Recently soft computing techniques i.e. neurocomputing, fuzzy logic, genetic algorithm etc. are gaining growing popularity for their remarkable ability of handling real life data like a human being in an environment of uncertainty, imprecision and implicit knowledge. In this work, a genetic algorithm in conjunction with a fuzzy fitness function, a fuzzy measure for evaluation of the quality of a feature has been proposed for feature subset selection. GA based feature selection algorithms are robust but their computation time is high specially when they are used with a classifier for fitness evaluation. The computationally light fuzzy fitness function lowers the computation time of the traditional GA based algorithm with classifier accuracy as the fitness function by separating the two stages feature selection and classification. Simulation over two data sets shows the efficiency of the proposed technique for achieving near optimal solution in practical problems specially when the data set contains a large number of features.

[1]  Donald E. Brown,et al.  Fast generic selection of features for neural network classifiers , 1992, IEEE Trans. Neural Networks.

[2]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[3]  Kenneth DeJong,et al.  Genetic algorithms as a tool for restructuring feature space representations , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[4]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[5]  Basabi Chakraborty,et al.  Fuzzy Set Theoretic Measure for Automatic Feature Evaluation , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Settimo Termini,et al.  A Definition of a Nonprobabilistic Entropy in the Setting of Fuzzy Sets Theory , 1972, Inf. Control..

[7]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[8]  Sankar K. Pal,et al.  Fuzzy measures in determining seed points in clustering , 1986, Pattern Recognit. Lett..

[9]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[10]  Basabi Sarkar On some fuzzy set theoretic measures and knowledge based approach for feature selection in a pattern recognition system , 1994 .

[11]  Lawrence Davis,et al.  Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm , 1991, ICGA.

[12]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..