A Hybrid Genetic Algorithm With Wrapper-Embedded Approaches for Feature Selection

Feature selection is an important research area for big data analysis. In recent years, various feature selection approaches have been developed, which can be divided into four categories: filter, wrapper, embedded, and combined methods. In the combined category, many hybrid genetic approaches from evolutionary computations combine filter and wrapper measures of feature evaluation to implement a population-based global optimization with efficient local search. However, there are limitations to existing combined methods, such as the two-stage and inconsistent feature evaluation measures, difficulties in analyzing data with high feature interaction, and challenges in handling large-scale features and instances. Focusing on these three limitations, we proposed a hybrid genetic algorithm with wrapper−embedded feature approach for selection approach (HGAWE), which combines genetic algorithm (global search) with embedded regularization approaches (local search) together. We also proposed a novel chromosome representation (intron+exon) for global and local optimization procedures in HGAWE. Based on this “intron+exon” encoding, the regularization method can select the relevant features and construct the learning model simultaneously, and genetic operations aim to globally optimize the control parameters in the above non-convex regularization. We mention that any efficient regularization approach can serve as the embedded method in HGAWE, and a hybrid $L_{1/2}+L_{2}$ regularization approach is investigated as an example in this paper. Empirical study of the HGAWE approach on some simulation data and five gene microarray data sets indicates that it outperforms the existing combined methods in terms of feature selection and classification accuracy.

[1]  Mengjie Zhang,et al.  Binary particle swarm optimisation for feature selection: A filter based approach , 2012, 2012 IEEE Congress on Evolutionary Computation.

[2]  Jacek M. Zurada,et al.  Identification of Full and Partial Class Relevant Genes , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Günter Rudolph,et al.  Design and comparison of different evolution strategies for feature selection and consolidation in music classification , 2009, 2009 IEEE Congress on Evolutionary Computation.

[4]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Panos M. Pardalos,et al.  Feature selection based on meta-heuristics for biomedicine , 2014, Optim. Methods Softw..

[7]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[8]  Zhong Yan,et al.  Ant Colony Optimization for Feature Selection in Face Recognition , 2004, ICBA.

[9]  Vaidyanathan K. Jayaraman,et al.  Hybrid feature selection and peptide binding affinity prediction using an EDA based algorithm , 2013, 2013 IEEE Congress on Evolutionary Computation.

[10]  Adel Al-Jumaily,et al.  A Combined Ant Colony and Differential Evolution Feature Selection Algorithm , 2008, ANTS Conference.

[11]  Jian Zhuang,et al.  Multi-objective unsupervised feature selection algorithm utilizing redundancy measure and negative epsilon-dominance for fault diagnosis , 2014, Neurocomputing.

[12]  Zongben Xu,et al.  Regularization: Convergence of Iterative Half Thresholding Algorithm , 2014 .

[13]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[15]  Li-Yeh Chuang,et al.  Feature Selection Using Memetic Algorithms , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[16]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[17]  Zongben Xu,et al.  L1/2 regularization , 2010, Science China Information Sciences.

[18]  Ana Carolina Lorena,et al.  Multi-objective Genetic Algorithm Evaluation in Feature Selection , 2011, EMO.

[19]  Mengjie Zhang,et al.  Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression , 2017, IEEE Transactions on Evolutionary Computation.

[20]  Stjepan Oreski,et al.  Genetic algorithm-based heuristic for feature selection in credit risk assessment , 2014, Expert Syst. Appl..

[21]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[22]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[23]  Genevera I. Allen,et al.  Molecular pathway identification using biological network-regularized logistic models , 2013, BMC Genomics.

[24]  A. Krasinskas,et al.  Loss of SMAD4 staining in pre‐operative cell blocks is associated with distant metastases following pancreaticoduodenectomy with venous resection for pancreatic cancer , 2014, Journal of surgical oncology.

[25]  Yahya Slimani,et al.  Memetic Feature Selection: Benchmarking Hybridization Schemata , 2010, HAIS.

[26]  Pedro Sousa,et al.  Email Spam Detection: a Symbiotic Feature Selection Approach Fostered by Evolutionary Computation , 2013, Int. J. Inf. Technol. Decis. Mak..

[27]  Jacob Scharcanski,et al.  Feature selection for face recognition based on multi-objective evolutionary wrappers , 2013, Expert Syst. Appl..

[28]  Aliasghar Arab,et al.  An adaptive gradient descent-based local search in memetic algorithm applied to optimal controller design , 2015, Inf. Sci..

[29]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[31]  Y.-C. Lee,et al.  Feature selection and classification by using grid computing based evolutionary approach for the microarray data , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[32]  K. De Jong,et al.  Effective Automated Feature Construction and Selection for Classification of Biological Sequences , 2014, PloS one.

[33]  Xiaobo Liu,et al.  Differential Evolution Based Band Selection in Hyperspectral Data Classification , 2010, ISICA.

[34]  Adam Lipowski,et al.  Roulette-wheel selection via stochastic acceptance , 2011, ArXiv.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[37]  Sushmita Mitra,et al.  Evolutionary Rough Feature Selection in Gene Expression Data , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  Yoo Jin Jung,et al.  The transcriptional landscape and mutational profile of lung adenocarcinoma , 2012, Genome research.

[39]  Sai Wang,et al.  Novel Regularization Method for Biomarker Selection and Cancer Classification , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.

[41]  Xiao-Ying Liu,et al.  Feature Selection and Cancer Classification via Sparse Logistic Regression with the Hybrid L1/2 +2 Regularization , 2016, PloS one.

[42]  Mohammad Saniee Abadeh,et al.  Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function , 2013, Eng. Appl. Artif. Intell..

[43]  Manuel Graña,et al.  Evolutionary ELM wrapper feature selection for Alzheimer's disease CAD on anatomical brain MRI , 2014, Neurocomputing.

[44]  Anne M. P. Canuto,et al.  A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[45]  Anirban Mukhopadhyay,et al.  A Graph-Theoretic Approach for Identifying Non-Redundant and Relevant Gene Markers from Microarray Data Using Multiobjective Binary PSO , 2014, PloS one.

[46]  L. Bullinger,et al.  Gene expression profiling in acute myeloid leukemia. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[47]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[48]  Agma J. M. Traina,et al.  Improving the ranking quality of medical image retrieval using a genetic feature selection method , 2011, Decis. Support Syst..

[49]  Hua Xu,et al.  A cooperative coevolution-based pittsburgh learning classifier system embedded with memetic feature selection , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[50]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[51]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[52]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[53]  Zhen Ji,et al.  Towards a Memetic Feature Selection Paradigm [Application Notes] , 2010, IEEE Computational Intelligence Magazine.

[54]  Dirk Sudholt,et al.  The impact of parametrization in memetic evolutionary algorithms , 2009, Theor. Comput. Sci..

[55]  Kazuyuki Murase,et al.  A new hybrid ant colony optimization algorithm for feature selection , 2012, Expert Syst. Appl..

[56]  Chengpeng Bi,et al.  Memetic algorithms for de novo motif-finding in biomedical sequences , 2012, Artif. Intell. Medicine.

[57]  Bishwajit Chakraborty,et al.  Genetic algorithm with fuzzy fitness function for feature selection , 2002, Industrial Electronics, 2002. ISIE 2002. Proceedings of the 2002 IEEE International Symposium on.

[58]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[59]  Mengjie Zhang,et al.  Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation , 2016, GECCO.

[60]  Lingmin Zeng,et al.  Group variable selection via SCAD-L2 , 2014 .

[61]  Mengjie Zhang,et al.  Improving Relevance Measures Using Genetic Programming , 2012, EuroGP.

[62]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[63]  Beatriz de la Iglesia,et al.  Evolutionary computation for feature selection in classification problems , 2013, WIREs Data Mining Knowl. Discov..

[64]  Meland,et al.  THE USE OF MOLECULAR PROFILING TO PREDICT SURVIVAL AFTER CHEMOTHERAPY FOR DIFFUSE LARGE-B-CELL LYMPHOMA , 2002 .

[65]  Chao Zhang,et al.  A comparison of typical ℓp minimization algorithms , 2013, Neurocomputing.

[66]  Zexuan Zhu,et al.  Memetic Algorithms for Feature Selection on Microarray Data , 2007, ISNN.

[67]  Yin-Fu Huang,et al.  Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data , 2009, Expert Syst. Appl..

[68]  Mengjie Zhang,et al.  Filter based backward elimination in wrapper based PSO for feature selection in classification , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[69]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[70]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[71]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[72]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[73]  Pier Luca Lanzi,et al.  Fast feature selection with genetic algorithms: a filter approach , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[74]  Aruna Tiwari,et al.  Construction of classifier with feature selection based on genetic programming , 2010, IEEE Congress on Evolutionary Computation.

[75]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[76]  Deyu Meng,et al.  What Objective Does Self-paced Learning Indeed Optimize? , 2015, ArXiv.