Gene expression classification using binary rule majority voting genetic programming classifier

The results of a gene expression study are difficult to interpret. To increase interpretability, researchers have developed classification techniques that produce rules to classify gene expression profiles. Genetic programming is one method to produce classification rules. These rules are difficult to interpret because they are based on complicated functions of gene expression values. We propose the binary rule majority voting genetic programming classifier BRMVGPC that classifies samples using binary rules based on the detection calls for genes instead of the gene expression values. BRMVGPC increases rule interpretability. We evaluate BRMVGPC on two public datasets, one brain and one prostate cancer, and achieved 88.89% and 86.39% accuracy respectively. These results are comparable to other classifiers in the gene expression profile domain. Specific contributions include a classification technique BRMVGPC and an iterative k-nearest neighbour technique for handling marginal detection call values.

[1]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[2]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[3]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[4]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  William Perrizo,et al.  Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis , 2004, J. Biomed. Informatics.

[6]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Sung-Bae Cho,et al.  Lymphoma Cancer Classification Using Genetic Programming with SNR Features , 2004, EuroGP.

[8]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[9]  K. Deb,et al.  Reliable classification of two-class cancer data using evolutionary algorithms. , 2003, Bio Systems.

[10]  Raymond J Carroll,et al.  DNA Microarray Experiments: Biological and Technological Aspects , 2002, Biometrics.

[11]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[12]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[13]  Max Bramer,et al.  Principles of Data Mining , 2013, Undergraduate Topics in Computer Science.

[14]  Jian Xin-chun The Microarray Data Analysis Process:from Raw Data to Biological Significance , 2007 .

[15]  T. Golub,et al.  DNA microarrays in clinical oncology. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[16]  H. Iba,et al.  Gene selection for classification of cancers using probabilistic model building genetic algorithm. , 2005, Bio Systems.

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[19]  Hitoshi Iba,et al.  Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Max Bramer Principles of Data Mining , 2013, Undergraduate Topics in Computer Science.

[21]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[22]  Musa H. Asyali,et al.  Gene Expression Profile Classification: A Review , 2006 .

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Pedro Larrañaga,et al.  Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS , 2005, J. Biomed. Informatics.

[25]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[26]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[27]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[28]  Gaolin Zheng,et al.  Microarray Data Analysis Using Neural Network Classifiers and Gene Selection Methods , 2005 .

[29]  Wolfgang Banzhaf,et al.  Genetic Programming based DNA Microarray Analysis for Classification of Cancer , 2007 .

[30]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[31]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[32]  Philip M. Long,et al.  Optimal gene expression analysis by microarrays. , 2002, Cancer cell.