Kernel-based learning and feature selection analysis for cancer diagnosis

Graphical abstractDisplay Omitted HighlightsA novel feature selection approach is proposed based on two steps.First step uses SVM-RFE to prefiltre the gene; we select 60% of relevant genes.Second step uses Binary Dragon Fly algorithm to optimal subset of genes.Objective function is the average of classification rate of three Kernel-based classifiers.The numerical results show the efficacy of the proposed approach. DNA microarray is a very active area of research in the molecular diagnosis of cancer. Microarray data are composed of many thousands of features and from tens to hundreds of instances, which make the analysis and diagnosis of cancer very complex. In this case, gene/feature selection becomes an elemental and essential task in data classification. In this paper, we propose a complete cancer diagnostic process through kernel-based learning and feature selection. First, support vector machines recursive feature elimination (SVM-RFE) is used to prefilter the genes. Second, the SVM-RFE is enhanced by using binary dragonfly (BDF), which is a recently developed metaheuristic that has never been benchmarked in the context of feature selection. The objective function is the average of classification accuracy rate generated by three kernel-based learning methods. We conducted a series of experiments on six microarray datasets often used in the literature. Experiment results demonstrate that this approach is efficient and provides a higher classification accuracy rate using a reduced number of genes.

[1]  W ReynoldsCraig Flocks, herds and schools: A distributed behavioral model , 1987 .

[2]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[3]  Ali Akbar Abdoos,et al.  Combined VMD-SVM based feature selection method for classification of power quality events , 2016, Appl. Soft Comput..

[4]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  A. Jemal,et al.  Cancer treatment and survivorship statistics, 2016 , 2016, CA: a cancer journal for clinicians.

[8]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[9]  Ying Liu,et al.  A Hybrid Approach for Biomarker Discovery from Microarray Gene Expression Data for Cancer Classification , 2007, Cancer informatics.

[10]  Razieh Sheikhpour,et al.  Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer , 2016, Appl. Soft Comput..

[11]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[12]  Muchenxuan Tong,et al.  An ensemble of SVM classifiers based on gene pairs , 2013, Comput. Biol. Medicine.

[13]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[14]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[15]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[16]  A. Jemal,et al.  Cancer treatment and survivorship statistics, 2012 , 2012, CA: a cancer journal for clinicians.

[17]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[18]  Melanie Hilario,et al.  Stability of feature selection algorithms , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Jin-Kao Hao,et al.  Gene Selection for Microarray Data by a LDA-Based Genetic Algorithm , 2008, PRIB.

[20]  Abdelkader Benyettou,et al.  Gray Wolf Optimizer for hyperspectral band selection , 2016, Appl. Soft Comput..

[21]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[22]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1987, SIGGRAPH.

[23]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[24]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[25]  Othman Soufan,et al.  An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model , 2012 .

[26]  Beatriz A. Garro,et al.  Classification of DNA microarrays using artificial neural networks and ABC algorithm , 2016, Appl. Soft Comput..

[27]  García-PedrajasNicolás,et al.  Simultaneous instance and feature selection and weighting using evolutionary computation , 2015 .

[28]  Xin-She Yang,et al.  Binary bat algorithm , 2013, Neural Computing and Applications.

[29]  Kuanquan Wang,et al.  Informative Gene Selection and Tumor Classification by Null Space LDA for Microarray Data , 2007, ESCAPE.

[30]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[31]  Huowang Chen,et al.  Feature Extraction from Tumor Gene Expression Profiles Using DCT and DFT , 2007, EPIA Workshops.

[32]  D. Dai,et al.  Generalized Discriminant Analysis for Tumor Classification with Gene Expression Data , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[33]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[34]  Jieping Ye,et al.  Using uncorrelated discriminant analysis for tissue classification with gene expression data , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Sung-Bae Cho,et al.  Cancer classification using ensemble of neural networks with multiple significant gene subsets , 2007, Applied Intelligence.

[36]  Zne-Jung Lee,et al.  Parameter determination of support vector machine and feature selection using simulated annealing approach , 2008, Appl. Soft Comput..

[37]  Shaoning Pang,et al.  Classification consistency analysis for bootstrapping gene selection , 2007, Neural Computing and Applications.

[38]  Javier Pérez-Rodríguez,et al.  Simultaneous instance and feature selection and weighting using evolutionary computation: Proposal and study , 2015, Appl. Soft Comput..

[39]  Zhoujun Li,et al.  An Effective Gene Selection Method Based on Relevance Analysis and Discernibility Matrix , 2007, PAKDD.

[40]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[41]  Yong Xu,et al.  Neuro-Fuzzy Ensemble Approach for Microarray Cancer Gene Expression Data Analysis , 2006, 2006 International Symposium on Evolving Fuzzy Systems.

[42]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[43]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[44]  V. Bajic,et al.  DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm , 2015, PloS one.

[45]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[46]  Shutao Li,et al.  Gene selection using genetic algorithm and support vectors machines , 2008, Soft Comput..

[47]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[48]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Jie Duan,et al.  Multi-label feature selection based on neighborhood mutual information , 2016, Appl. Soft Comput..

[50]  M. Johnson,et al.  Circulating microRNAs in Sera Correlate with Soluble Biomarkers of Immune Activation but Do Not Predict Mortality in ART Treated Individuals with HIV-1 Infection: A Case Control Study , 2015, PloS one.

[51]  Jack Y. Yang,et al.  Partial Least Squares Based Dimension Reduction with Gene Selection for Tumor Classification , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[52]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[53]  Seyedali Mirjalili,et al.  Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems , 2015, Neural Computing and Applications.

[54]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .