A novel gene selection algorithm for cancer classification using microarray datasets

BackgroundMicroarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results.MethodsAn innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP.ResultsExperimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods.ConclusionGene subset selected by GSP can achieve a higher classification accuracy with less processing time.

[1]  Mohd Saberi Mohamad,et al.  An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes , 2013, Algorithms for Molecular Biology.

[2]  Hala Alshamlan,et al.  mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling , 2015, BioMed research international.

[3]  Wei-Chung Cheng,et al.  Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm , 2014, BMC Bioinformatics.

[4]  M. Halfon,et al.  Identifying transcriptional cis‐regulatory modules in animal genomes , 2015, Wiley interdisciplinary reviews. Developmental biology.

[5]  Jingyu Hou,et al.  Multiclass Lung Cancer Diagnosis by Gene Expression Programming and Microarray Datasets , 2017, ADMA.

[6]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[7]  Mohammad Sohel Rahman,et al.  Gene selection for cancer classification with the help of bees , 2016, BMC Medical Genomics.

[8]  Cândida Ferreira Gene Expression Programming in Problem Solving , 2002 .

[9]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[10]  Chang-an Yuan,et al.  An improved Gene Expression Programming approach for symbolic regression problems , 2014, Neurocomputing.

[11]  Jingyu Hou,et al.  SBC: A New Strategy for Multiclass Lung Cancer Classification Based on Tumour Structural Information and Microarray Data , 2018, 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS).

[12]  Adrian Pino Angulo Gene Selection for Microarray Cancer Data Classification by a Novel Rule-Based Algorithm , 2018, Inf..

[13]  Zhuang Yu,et al.  Prediction of lung cancer based on serum biomarkers by gene expression programming methods. , 2014, Asian Pacific journal of cancer prevention : APJCP.

[14]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[15]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[16]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[17]  Hong-Qiang Wang,et al.  Biology-constrained gene expression discretization for cancer classification , 2014, Neurocomputing.

[18]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[19]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[20]  Yong Xiang,et al.  New Gene Selection Method Using Gene Expression Programing Approach on Microarray Data Sets , 2018, Computer and Information Science.

[21]  Lianhua Cui,et al.  A Highly Efficient Gene Expression Programming (GEP) Model for Auxiliary Diagnosis of Small Cell Lung Cancer , 2015, PloS one.

[22]  Soledad Espezua,et al.  A Projection Pursuit framework for supervised dimension reduction of high dimensional small sample datasets , 2015, Neurocomputing.

[23]  Li-Yeh Chuang,et al.  IG-GA: A Hybrid Filter/Wrapper Method for Feature Selection of Microarray Data , 2010 .

[24]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[25]  Maciej Kusy,et al.  Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients , 2013, Medical & Biological Engineering & Computing.

[26]  Li-Yeh Chuang,et al.  Tabu Search and Binary Particle Swarm Optimization for Feature Selection Using Microarray Data , 2009, J. Comput. Biol..

[27]  Mohd Saberi Mohamad,et al.  A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes From Gene Expression Data , 2011, IEEE Transactions on Information Technology in Biomedicine.

[28]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[29]  J. Mesirov,et al.  Chemosensitivity prediction by transcriptional profiling , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Abdulmotaleb El-Saddik,et al.  Feature selection and classification in genetic programming: Application to haptic-based biometric data , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[31]  Yong Xiang,et al.  Lung cancer prediction from microarray data by gene expression programming. , 2016, IET systems biology.

[32]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[33]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[34]  Parham Moradi,et al.  Gene selection for microarray data classification using a novel ant colony optimization , 2015, Neurocomputing.

[35]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[36]  Dervis Karaboga,et al.  Artificial Bee Colony (ABC) Optimization Algorithm for Solving Constrained Optimization Problems , 2007, IFSA.

[37]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[38]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[39]  Hala M. Alshamlan,et al.  The Performance of Bio-Inspired Evolutionary Gene Selection Methods for Cancer Classification Using Microarray Dataset , 2014 .

[40]  Sejong Oh,et al.  A novel divide-and-merge classification for high dimensional datasets , 2013, Comput. Biol. Chem..

[41]  Aboul Ella Hassanien,et al.  A wrapper approach for feature selection based on swarm optimization algorithm inspired from the behavior of social-spiders , 2015, 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR).

[42]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[43]  Shuai Cheng Li,et al.  The difficulty of protein structure alignment under the RMSD , 2013, Algorithms for Molecular Biology.

[44]  Riccardo Poli,et al.  Geometric Particle Swarm Optimisation , 2007, EuroGP.

[45]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[46]  Jingyu Hou,et al.  Prediction of NSCLC recurrence from microarray data with GEP. , 2017, IET systems biology.

[47]  Yadong Wang,et al.  Comparison among dimensionality reduction techniques based on Random Projection for cancer classification , 2016, Comput. Biol. Chem..

[48]  Cong Jin,et al.  Attribute selection method based on a hybrid BPNN and PSO algorithms , 2012, Appl. Soft Comput..

[49]  Wei-Chang Yeh,et al.  Gene selection using information gain and improved simplified swarm optimization , 2016, Neurocomputing.