A multi-objective heuristic algorithm for gene expression microarray data classification

A multi-objective model for microarray based on analytic hierarchy process is built.A heuristic algorithm improved from UMDA called MOEDA is to solve the model.Both classification accuracy and number of genes are the objectives.The classification accuracy is treated absolutely important than the number of genes.It always gets high accuracy with small number of genes on microarray data. Microarray data has significant potential in clinical medicine, which always owns a large quantity of genes relative to the samples' number. Finding a subset of discriminatory genes (features) through intelligent algorithms has been trend. Based on this, building a disease prognosis expert system will bring a great effect on clinical medicine. In addition, the fewer the selected genes are, the less cost the disease prognosis expert system is. So the small gene set with high classification accuracy is what we need. In this paper, a multi-objective model is built according to the analytic hierarchy process (AHP), which treats the classification accuracy absolutely important than the number of selected genes. And a multi-objective heuristic algorithm called MOEDA is proposed to solve the model, which is an improvement of Univariate Marginal Distribution Algorithm. Two main rules are designed, one is 'Higher and Fewer Rule' which is used for evaluating and sorting individuals and the other is 'Forcibly Decrease Rule' which is used for generate potential individuals with high classification accuracy and fewer genes. Our proposed method is tested on both binary-class and multi-class microarray datasets. The results show that the gene set selected by MOEDA not only results in higher accuracies, but also keep a small scale, which cannot only save computational time but also improve the interpretability and application of the result with the simple classification model. The proposed MOEDA opens up a new way for the heuristic algorithms applying on microarray gene expression data.

[1]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[2]  Frederic M. Waldman,et al.  Cost Effectiveness of Sequencing 34 Cancer-Associated Genes as an Aid for Treatment Selection in Patients with Metastatic Melanoma , 2015, Molecular Diagnosis & Therapy.

[3]  Wengang Zhou,et al.  A novel class dependent feature selection method for cancer biomarker discovery , 2014, Comput. Biol. Medicine.

[4]  Stanislaw Osowski,et al.  Data mining for feature selection in gene expression autism data , 2015, Expert Syst. Appl..

[5]  Zijiang Yang,et al.  Feature selection for high-dimensional multi-category data using PLS-based local recursive feature elimination , 2014, Expert Syst. Appl..

[6]  Joaquín Dopazo,et al.  Papers on normalization, variable selection, classification or clustering of microarray data , 2009, Bioinform..

[7]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Xiaolong Wang,et al.  Using distances between Top-n-gram and residue pairs for protein remote homology detection , 2014, BMC Bioinformatics.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Wei-Chung Cheng,et al.  Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm , 2014, BMC Bioinformatics.

[12]  Jin Li,et al.  Using cooperative game theory to optimize the feature selection problem , 2012, Neurocomputing.

[13]  Shiquan Sun,et al.  A Kernel-Based Multivariate Feature Selection Method for Microarray Data Classification , 2014, PloS one.

[14]  Jun Guo,et al.  A Pareto supplier selection algorithm for minimum the life cycle cost of complex product system , 2015, Expert Syst. Appl..

[15]  Hung-Wen Chiu,et al.  Risk classification of cancer survival using ANN with gene expression data from multiple laboratories , 2014, Comput. Biol. Medicine.

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  Surajit Ray,et al.  Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction , 2011, BMC Bioinformatics.

[18]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[19]  Seoung Bum Kim,et al.  Sequential random k-nearest neighbor feature selection for high-dimensional data , 2015, Expert Syst. Appl..

[20]  Li-Yeh Chuang,et al.  A hybrid feature selection method for DNA microarray data , 2011, Comput. Biol. Medicine.

[21]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Stanislaw Osowski,et al.  Computerized system for recognition of autism on the basis of gene expression microarray data , 2015, Comput. Biol. Medicine.

[23]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[24]  Madhubanti Maitra,et al.  Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique , 2015, Expert Syst. Appl..

[25]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[26]  Muchenxuan Tong,et al.  An ensemble of SVM classifiers based on gene pairs , 2013, Comput. Biol. Medicine.

[27]  Richard Simon,et al.  Microarray-based cancer prediction using single genes , 2011, BMC Bioinformatics.

[28]  Xibei Yang,et al.  Interval-valued analysis for discriminative gene selection and tissue sample classification using microarray data. , 2013, Genomics.

[29]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[30]  Guoli Ji,et al.  PLS-based recursive feature elimination for high-dimensional small sample , 2014, Knowl. Based Syst..

[31]  Ehsan Lotfi,et al.  Gene expression microarray classification using PCA-BEL , 2014, Comput. Biol. Medicine.

[32]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[33]  Hyunjin Kim,et al.  ICP: A novel approach to predict prognosis of prostate cancer with inner-class clustering of gene expression data , 2013, Comput. Biol. Medicine.

[34]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[35]  Weiwei Chen,et al.  Efficient subset selection for the expected opportunity cost , 2015, Autom..

[36]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[37]  Loris Nanni,et al.  Combining multiple approaches for gene microarray classification , 2012, Bioinform..

[38]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[39]  Li-Yeh Chuang,et al.  IG-GA: A Hybrid Filter/Wrapper Method for Feature Selection of Microarray Data , 2010 .

[40]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[41]  Nicoletta Dessì,et al.  Similarity of feature selection methods: An empirical study across data intensive classification tasks , 2015, Expert Syst. Appl..

[42]  Daniel Q. Naiman,et al.  Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2004, Statistical applications in genetics and molecular biology.

[43]  Jin Li,et al.  Feature evaluation and selection with cooperative game theory , 2012, Pattern Recognit..