Recursive Memetic Algorithm for gene selection in microarray data

Abstract Feature selection algorithm contributes a lot in the domain of medical diagnosis. Choosing a small subset of genes that enable a classifier to predict the presence or type of disease accurately is a difficult optimisation problem due to the size of the microarray data. The dual task of achieving higher accuracy and a small number of features makes it a challenging research problem. In our work, we have developed a Recursive Memetic Algorithm (RMA) model for selection of genes. It is a variant of Memetic Algorithm (MA) and performs much better than MA as well as Genetic Algorithm (GA). RMA has been applied on seven microarray datasets namely, AMLGSE2191, Colon, DLBCL, Leukaemia, Prostate, MLL and SRBCT. Encouraging results obtained by the proposed model, reported in this article, are biologically validated with the use of Gene Oncology, KEGG pathways and heat maps.

[1]  Jin-Kao Hao,et al.  A memetic algorithm for gene selection and molecular classification of cancer , 2009, GECCO '09.

[2]  Paolo Carinci,et al.  Apoptotic genes as potential markers of metastatic phenotype in human osteosarcoma cell lines. , 2008, International journal of oncology.

[3]  Mauricio Cabrera-Ríos,et al.  Identification of potential biomarkers from microarray experiments using multiple criteria optimization , 2013, Cancer medicine.

[4]  Rakesh Kumar,et al.  Blending Roulette Wheel Selection & Rank Selection in Genetic Algorithms , 2012 .

[5]  Peng Yang,et al.  Analysis of preclinical and clinical samples after treatment with a CD37 targeting antibody drug conjugate (AGS67E) support a high level of CD37 expression in NHL , 2017 .

[6]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[7]  Wei Wang,et al.  Peptides Identified Through Phage Display for Prostate Cancer Imaging and Therapy , 2015 .

[8]  Andrzej Kloczkowski,et al.  Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer , 2015, BMC Bioinformatics.

[9]  J. Niland,et al.  Myeloperoxidase immunoreactivity in adult acute lymphoblastic leukemia. , 2001, American journal of clinical pathology.

[10]  Zexuan Zhu,et al.  Memetic Algorithms for Feature Selection on Microarray Data , 2007, ISNN.

[11]  Cristina Rubio-Escudero,et al.  Mining 3D Patterns from Gene Expression Temporal Data: A New Tricluster Evaluation Measure , 2014, TheScientificWorldJournal.

[12]  Pengyuan Liu,et al.  Common Human Cancer Genes Discovered by Integrated Gene-Expression Analysis , 2007, PloS one.

[13]  Giuseppe Basso,et al.  MLL rearrangements in pediatric acute lymphoblastic and myeloblastic leukemias: MLL specific and lineage specific signatures , 2009, BMC Medical Genomics.

[14]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[15]  Gustavo Adolfo Alonso-Silverio,et al.  Simultaneous Gene Selection and Weighting in Nearest Neighbor Classifier for Gene Expression Data , 2017, IWBBIO.

[16]  Ralph Weissleder,et al.  Detection of early prostate cancer using a hepsin-targeted imaging agent. , 2008, Cancer research.

[17]  Bo Tang,et al.  EEF: Exponentially Embedded Families With Class-Specific Features for Classification , 2016, IEEE Signal Processing Letters.

[18]  Zulaiha Ali Othman,et al.  Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data , 2017, Expert Syst. Appl..

[19]  Jaume Bacardit,et al.  Functional networks inference from rule-based machine learning models , 2016, BioData Mining.

[20]  Jagath C. Rajapakse,et al.  Gene and sample selection using T-score with sample selection , 2016, J. Biomed. Informatics.

[21]  Kazuyuki Murase,et al.  A new hybrid ant colony optimization algorithm for feature selection , 2012, Expert Syst. Appl..

[22]  Tatsuhiko Tsunoda,et al.  Gene masking - a technique to improve accuracy for cancer classification with high dimensionality in microarray data , 2016, BMC Medical Genomics.

[23]  Arunkumar Chinnaswamy,et al.  Hybrid Feature Selection Using Correlation Coefficient and Particle Swarm Optimization on Microarray Gene Expression Data , 2015, IBICA.

[24]  Xiao Chen,et al.  A multi-objective heuristic algorithm for gene expression microarray data classification , 2016, Expert Syst. Appl..

[25]  Marjan Mernik,et al.  Exploration and exploitation in evolutionary algorithms: A survey , 2013, CSUR.

[26]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[27]  S. Teichmann,et al.  Evolution of transcription factors and the gene regulatory network in Escherichia coli. , 2003, Nucleic acids research.

[28]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[29]  Yi Xia,et al.  Tetraspanin CD37 protects against the development of B cell lymphoma. , 2016, The Journal of clinical investigation.

[30]  M. Borowitz,et al.  Mixed phenotype acute leukemia , 2013, Cytometry. Part B, Clinical cytometry.

[31]  Mita Nasipuri,et al.  Memetic Algorithm Based Feature Selection for Handwritten City Name Recognition , 2017, CICBA.

[32]  Tao Zhou,et al.  Gene Ontology, Enrichment Analysis, and Pathway Analysis , 2017 .

[33]  I. Halil Kavakli,et al.  Optimization Based Tumor Classification from Microarray Gene Expression Data , 2011, PloS one.

[34]  D Swan,et al.  Human myeloperoxidase gene expression in acute leukemia. , 1989, Blood.

[35]  Tao Lu,et al.  Genetic pathways, prevention, and treatment of sporadic colorectal cancer , 2014, Oncoscience.

[36]  Vamsidhar Velcheti,et al.  In Situ Tumor PD-L1 mRNA Expression Is Associated with Increased TILs and Better Outcome in Breast Carcinomas , 2014, Clinical Cancer Research.

[37]  Rong Chen,et al.  Predicting Presynaptic and Postsynaptic Neurotoxins by Developing Feature Selection Technique , 2017, BioMed research international.

[38]  Kok-Leong Ong,et al.  Feature selection for high dimensional imbalanced class data using harmony search , 2017, Eng. Appl. Artif. Intell..

[39]  Atsushi Hijikata,et al.  Identification of CD34+ and CD34- leukemia-initiating cells in MLL-rearranged human acute lymphoblastic leukemia. , 2015, Blood.

[40]  Yan Li,et al.  An empirical evaluation of mutation and crossover operators for multi-objective uncertainty-wise test minimization , 2017 .

[41]  C. Epstein,et al.  Microarray technology - enhanced versatility, persistent challenge. , 2000, Current opinion in biotechnology.

[42]  Kanta Premji Vekaria,et al.  Selective Crossover in Genetic Algorithms: An Empirical Study , 1998, PPSN.

[43]  José Cristóbal Riquelme Santos,et al.  TriGen: A genetic algorithm to mine triclusters in temporal gene expression data , 2014, Neurocomputing.

[44]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[45]  Steinar Thorvaldsen,et al.  A Mutation Model from First Principles of the Genetic Code , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[46]  S. Vowler,et al.  Integration of copy number and transcriptomics provides risk stratification in prostate cancer: A discovery and validation cohort study , 2015, EBioMedicine.

[47]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[48]  Victoria Y. Bird,et al.  Trends in Gene Expression Profiling for Prostate Cancer Risk Assessment: A Systematic Review , 2017, Biomedicine Hub.

[49]  Yan Zhang,et al.  Application of ReliefF algorithm to selecting feature sets for classification of high resolution remote sensing image , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[50]  M. Mohammadi,et al.  Robust and stable gene selection via Maximum-Minimum Correntropy Criterion. , 2016, Genomics.

[51]  Akimichi Ohsaka,et al.  Acute Undifferentiated Leukemia or Minimally Differentiated Acute Myeloid Leukemia: Further Emphasis on Molecular Analysis in Leukemia Diagnosis , 2016 .

[52]  Hossein Nezamabadi-pour,et al.  A hybrid method for dimensionality reduction in microarray data based on advanced binary ant colony algorithm , 2016, 2016 1st Conference on Swarm Intelligence and Evolutionary Computation (CSIEC).

[53]  Shuigeng Zhou,et al.  A New Approach for Feature Selection from Microarray Data Based on Mutual Information , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[54]  Jin Hyun Park,et al.  New gene selection method for classification of cancer subtypes considering within‐class variation , 2003, FEBS letters.

[55]  A. Gabrielsen,et al.  Gene expression signatures, pathways and networks in carotid atherosclerosis , 2016, Journal of internal medicine.

[56]  Natalia Shulzhenko,et al.  Microarrays for cancer diagnosis and classification. , 2007, Advances in experimental medicine and biology.

[57]  Cristina Rubio-Escudero,et al.  MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data , 2015, Evolutionary bioinformatics online.

[58]  Gilbert Laporte,et al.  Metaheuristics: A bibliography , 1996, Ann. Oper. Res..

[59]  Hao Liao,et al.  An efficient semi-supervised representatives feature selection algorithm based on information theory , 2017, Pattern Recognit..

[60]  Ujjwal Maulik,et al.  Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data , 2017, IEEE Transactions on NanoBioscience.

[61]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[62]  Ali Mobasheri Tissue Microarray Technology and Its Potential Applications in Toxicology and Toxicological Immunohistochemistry , 2016 .

[63]  Huihui Chen,et al.  A kernel-based clustering method for gene selection with gene expression data , 2016, J. Biomed. Informatics.

[64]  Xiaokang Zhang,et al.  Global feature selection from microarray data using Lagrange multipliers , 2016, Knowl. Based Syst..

[65]  Cristina Rubio-Escudero,et al.  LSL: A new measure to evaluate triclusters , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[66]  Jianqing Fan,et al.  Statistical Analysis of DNA Microarray Data in Cancer Research , 2006, Clinical Cancer Research.

[67]  Muhammad Sarim,et al.  Gene Ontology Tools: A Comparative Study , 2015 .

[68]  K. Ma,et al.  Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study , 2017, Scientific Reports.

[69]  E. Sahai,et al.  RHO–GTPases and cancer , 2002, Nature Reviews Cancer.

[70]  Avinash R. Vaidya,et al.  Neural Mechanisms for Undoing the “Curse of Dimensionality” , 2015, The Journal of Neuroscience.

[71]  A. Jemal,et al.  Cancer statistics, 2018 , 2018, CA: a cancer journal for clinicians.

[72]  Xin-She Yang,et al.  A Novel Hybrid Firefly Algorithm for Global Optimization , 2016, PloS one.