mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

[1]  Sayan Mukherjee,et al.  Classifying Microarray Data Using Support Vector Machines , 2003 .

[2]  Hui-Ling Huang,et al.  ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data , 2007, Biosyst..

[3]  Jing Zhao,et al.  A Modified Ant Colony Optimization Algorithm for Tumor Marker Gene Selection , 2009, Genom. Proteom. Bioinform..

[4]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[5]  Hala M. Alshamlan,et al.  The Performance of Bio-Inspired Evolutionary Gene Selection Methods for Cancer Classification Using Microarray Dataset , 2014 .

[6]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[7]  Anirban Mukherjee,et al.  Multicategory cancer classification from gene expression data by multiclass NPPC ensemble , 2010, 2010 International Conference on Systems in Medicine and Biology.

[8]  Enrique Alba,et al.  Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[9]  Yungho Leu,et al.  A novel hybrid feature selection method for microarray data analysis , 2011, Appl. Soft Comput..

[10]  Carlos J. Alonso,et al.  Microarray gene expression classification with few genes: Criteria to combine attribute selection and classification methods , 2012, Expert Syst. Appl..

[11]  P. Saratchandran,et al.  Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Feng Chu,et al.  Applications of support vector machines to cancer classification with microarray data , 2005, Int. J. Neural Syst..

[13]  Xia Li,et al.  A Comparative Study of Artificial Bee Colony, Bees Algorithms and Differential Evolution on Numerical Benchmark Problems , 2010, ISICA.

[14]  Liangbiao Chen,et al.  Multi-class cancer classification through gene expression profiles: microRNA versus mRNA. , 2009, Journal of genetics and genomics = Yi chuan xue bao.

[15]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[16]  Dervis Karaboga,et al.  A comparative study of Artificial Bee Colony algorithm , 2009, Appl. Math. Comput..

[17]  Yuh-Min Chen,et al.  Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method , 2011, Expert Syst. Appl..

[18]  Li-Yeh Chuang,et al.  A Hybrid Feature Selection Method for Microarray Classification , 2022 .

[19]  Driss Aboutajdine,et al.  A New gene selection approach based on Minimum Redundancy-Maximum Relevance (MRMR) and Genetic Algorithm (GA) , 2009, 2009 IEEE/ACS International Conference on Computer Systems and Applications.

[20]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[21]  Wei Kong,et al.  A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. , 2007, Talanta.

[22]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[23]  Shinn-Ying Ho,et al.  Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers , 2007, Biosyst..

[24]  Mohd Saberi Mohamad,et al.  An Approach Using Hybrid Methods to Select Informative Genes from Microarray Data for Cancer Classification , 2008, 2008 Second Asia International Conference on Modelling & Simulation (AMS).

[25]  Seyed Mohammad Hosseini,et al.  A Novel Weighted Support Vector Machine Based on Particle Swarm Optimization for Gene Selection and Tumor Classification , 2012, Comput. Math. Methods Medicine.

[26]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[27]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[28]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[30]  Davar Giveki,et al.  Automatic detection of erythemato-squamous diseases using PSO-SVM based on association rules , 2013, Eng. Appl. Artif. Intell..

[31]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[32]  Yu-Dong Cai,et al.  Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties , 2010, PloS one.

[33]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[34]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[35]  Alireza Osareh,et al.  Microarray data analysis for cancer classification , 2010, 2010 5th International Symposium on Health Informatics and Bioinformatics.

[36]  Wan-li Xiang,et al.  An efficient and robust artificial bee colony algorithm for numerical optimization , 2013, Comput. Oper. Res..

[37]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[38]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[39]  Ghada Hany Badr,et al.  A Comparative Study of Cancer Classification Methods Using Microarray Gene Expression Profile , 2013, DaEng.

[40]  Hala M. Alshamlan,et al.  A Study of Cancer Microarray Gene Expression Profile : Objectives and Approaches , 2013 .

[41]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[42]  R. Simon,et al.  Analysis of DNA microarray expression data. , 2009, Best practice & research. Clinical haematology.

[43]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Michael R. Lyu,et al.  Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer , 2006, ICIC.

[45]  Jihoon Yang,et al.  Feature Subset Selection Based on Bio-Inspired Algorithms , 2011, J. Inf. Sci. Eng..

[46]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[47]  D. Karaboga,et al.  On the performance of artificial bee colony (ABC) algorithm , 2008, Appl. Soft Comput..

[48]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[49]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[50]  Jesmin Nahar,et al.  Microarray data classification using automatic SVM kernel selection. , 2007, DNA and cell biology.

[51]  Xiaosheng Wang,et al.  Microarray-Based Cancer Prediction Using Soft Computing Approach , 2009, Cancer informatics.