A hybrid genetic algorithm for feature selection wrapper based on mutual information

In this study, a hybrid genetic algorithm is adopted to find a subset of features that are most relevant to the classification task. Two stages of optimization are involved. The outer optimization stage completes the global search for the best subset of features in a wrapper way, in which the mutual information between the predictive labels of a trained classifier and the true classes serves as the fitness function for the genetic algorithm. The inner optimization performs the local search in a filter manner, in which an improved estimation of the conditional mutual information acts as an independent measure for feature ranking taking account of not only the relevance of the candidate feature to the output classes but also the redundancy to the already-selected features. The inner and outer optimizations cooperate with each other and achieve the high global predictive accuracy as well as the high local search efficiency. Experimental results demonstrate both parsimonious feature selection and excellent classification accuracy of the method on a range of benchmark data sets.

[1]  Steven Guan,et al.  Feature selection for modular GA-based classification , 2004, Appl. Soft Comput..

[2]  Mark Last,et al.  A compact and accurate model for classification , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Bir Bhanu,et al.  Genetic algorithm based feature selection for target detection in SAR images , 2003, Image Vis. Comput..

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[7]  Sergio Verdú,et al.  A general formula for channel capacity , 1994, IEEE Trans. Inf. Theory.

[8]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[9]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[10]  Tommy W. S. Chow,et al.  Effective feature selection scheme using mutual information , 2005, Neurocomputing.

[11]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[12]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[13]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Lipika Dey,et al.  A feature selection technique for classificatory analysis , 2005, Pattern Recognit. Lett..

[15]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[16]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[17]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[18]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[19]  Deniz Erdogmus,et al.  Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information , 2004, J. VLSI Signal Process..

[20]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[21]  Sergio Verdú,et al.  Generalizing the Fano inequality , 1994, IEEE Trans. Inf. Theory.

[22]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[23]  Deniz Erdogmus,et al.  Feature selection in MLPs and SVMs based on maximum output information , 2004, IEEE Transactions on Neural Networks.

[24]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[25]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[26]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[27]  Justin Doak,et al.  CSE-92-18 - An Evaluation of Feature Selection Methodsand Their Application to Computer Security , 1992 .

[28]  Fernando Pérez-Cruz,et al.  Enhancing genetic feature selection through restricted search and Walsh analysis , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[30]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[31]  Pierre Beauseroy,et al.  Mutual information-based feature extraction on the time-frequency plane , 2002, IEEE Trans. Signal Process..

[32]  Abraham Kandel,et al.  Information-theoretic algorithm for feature selection , 2001, Pattern Recognit. Lett..

[33]  Mykola Pechenizkiy,et al.  Diversity in search strategies for ensemble feature selection , 2005, Inf. Fusion.

[34]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[35]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[36]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[37]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[39]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[40]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[41]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[42]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[43]  Neri Merhav,et al.  Relations between entropy and error probability , 1994, IEEE Trans. Inf. Theory.

[44]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[45]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[46]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[47]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[48]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[49]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .