An Improved Binary Differential Evolution Algorithm for Feature Selection in Molecular Signatures

The discovery of biomarkers from high‐dimensional data is a very challenging task in cancer diagnoses. On the one hand, biomarker discovery is the so‐called high‐dimensional small‐sample problem. On the other hand, these data are redundant and noisy. In recent years, biomarker discovery from high‐throughput biological data has become an increasingly important emerging topic in the field of bioinformatics. In this study, we propose a binary differential evolution algorithm for feature selection. Firstly, we suggest using a two‐stage approach, where three filter methods including the Fisher score, T‐statistics, and Information gain are used to generate the feature pool for input to differential evolution (DE). Secondly, in order to improve the performance of differential evolution algorithm for feature selection, a new variant of binary DE called BDE is proposed. Three optimization strategies are incorporated into the BDE. The first strategy is the heuristic method in initial stage, the second one is the self‐adaptive parameter control, and the third one is the minimum change value to improve the exploration behaviour thus enhance the diversity. Finally, Support vector machine (SVM) is used as the classifier in 10 fold cross‐validation method. The experimental results of our proposed algorithm on some benchmark datasets demonstrate the effectiveness of our algorithm. In addition, the BDE forged in this study will be of great potential in feature selection problems.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  F. Afshinmanesh,et al.  Design of a Single-Feed Dual-Band Dual-Polarized Printed Microstrip Antenna Using a Boolean Particle Swarm Optimization , 2008, IEEE Transactions on Antennas and Propagation.

[3]  Zuren Feng,et al.  An efficient ant colony optimization approach to attribute reduction in rough set theory , 2008, Pattern Recognit. Lett..

[4]  Tom Starzl,et al.  THE LANCET , 1992, The Lancet.

[5]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[6]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[7]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[8]  J. Ioannidis Microarrays and molecular research: noise discovery? , 2005, The Lancet.

[9]  Fariba Bahrami,et al.  Boolean Particle Swarm Optimization and Its Application to the Design of a Dual-Band Dual-Polarized Planar Antenna , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[10]  Bo Liu,et al.  Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures , 2015, Molecular informatics.

[11]  Yongming Li,et al.  Research of multi-population agent genetic algorithm for feature selection , 2009, Expert Syst. Appl..

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Kyuseok Shim,et al.  Workshop Report: 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery , 2000, SIGKDD Explor..

[14]  S. Gunasundari,et al.  Velocity Bounded Boolean Particle Swarm Optimization for improved feature selection in liver and kidney disease diagnosis , 2016, Expert Syst. Appl..

[15]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[16]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[17]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[18]  Selma Ayse Özel,et al.  A hybrid approach of differential evolution and artificial bee colony for feature selection , 2016, Expert Syst. Appl..

[19]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[20]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[21]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Dexuan Zou,et al.  An improved differential evolution algorithm for the task assignment problem , 2011, Eng. Appl. Artif. Intell..

[23]  Minghao Yin,et al.  A discrete artificial bee colony algorithm with composite mutation strategies for permutation flow shop scheduling problem , 2012 .

[24]  Aboul Ella Hassanien,et al.  Binary ant lion approaches for feature selection , 2016, Neurocomputing.

[25]  Ying Liu,et al.  A Comparative Study on Feature Selection Methods for Drug Discovery , 2004, J. Chem. Inf. Model..

[26]  Ashish Ghosh,et al.  Self-adaptive differential evolution for feature selection in hyperspectral image data , 2013, Appl. Soft Comput..

[27]  Chun-Yin Wu,et al.  Topology optimization of structures using modified binary differential evolution , 2010 .

[28]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[29]  Melanie Hilario,et al.  Approaches to dimensionality reduction in proteomic biomarker studies , 2007, Briefings Bioinform..

[30]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[32]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[33]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[34]  Huan Liu,et al.  Handling Large Unsupervised Data via Dimensionality Reduction , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[35]  Tao Gong,et al.  Differential Evolution for Binary Encoding , 2007 .

[36]  Bo Gao,et al.  Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm , 2016, BMC Bioinformatics.

[37]  Marcel J. T. Reinders,et al.  Random subspace method for multivariate feature selection , 2006, Pattern Recognit. Lett..

[38]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[39]  Ben Niu,et al.  A novel bacterial algorithm with randomness control for feature selection in classification , 2017, Neurocomputing.

[40]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[41]  Qingfu Zhang,et al.  Enhancing the search ability of differential evolution through orthogonal crossover , 2012, Inf. Sci..

[42]  Minghao Yin,et al.  Multiobjective Binary Biogeography Based Optimization for Feature Selection Using Gene Expression Data , 2013, IEEE Transactions on NanoBioscience.

[43]  Yanqing Zhang,et al.  A genetic algorithm-based method for feature subset selection , 2008, Soft Comput..

[44]  Q. Ning,et al.  Identification of S-glutathionylation sites in species-specific proteins by incorporating five sequence-derived features into the general pseudo-amino acid composition. , 2016, Journal of theoretical biology.