Improving Cancer Classification Accuracy Using Gene Pairs

Recent studies suggest that the deregulation of pathways, rather than individual genes, may be critical in triggering carcinogenesis. The pathway deregulation is often caused by the simultaneous deregulation of more than one gene in the pathway. This suggests that robust gene pair combinations may exploit the underlying bio-molecular reactions that are relevant to the pathway deregulation and thus they could provide better biomarkers for cancer, as compared to individual genes. In order to validate this hypothesis, in this paper, we used gene pair combinations, called doublets, as input to the cancer classification algorithms, instead of the original expression values, and we showed that the classification accuracy was consistently improved across different datasets and classification algorithms. We validated the proposed approach using nine cancer datasets and five classification algorithms including Prediction Analysis for Microarrays (PAM), C4.5 Decision Trees (DT), Naive Bayesian (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN).

[1]  W. Wong,et al.  Functional annotation and network reconstruction through cross-platform integration of microarray data , 2005, Nature Biotechnology.

[2]  Blaz Zupan,et al.  Data and text mining Visualization-based cancer microarray data classification analysis , 2007 .

[3]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[4]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[6]  Jusang Lee,et al.  A cube framework for incorporating inter-gene information into biological data mining , 2009, Int. J. Data Min. Bioinform..

[7]  Sejong Yoon,et al.  k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification , 2010, Soft Comput..

[8]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[9]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[14]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[15]  Daniel Q. Naiman,et al.  Statistical Applications in Genetics and Molecular Biology Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2011 .

[16]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Jaewoo Kang,et al.  Using Gene Pair Combinations to Improve the Accuracy of the PAM Classifier , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[19]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[21]  Jiong Yang,et al.  Microarray data mining using landmark gene-guided clustering , 2008, BMC Bioinformatics.

[22]  Asa Ben-Hur,et al.  The use of gene ontology evidence codes in preventing classifier assessment bias , 2009, Bioinform..

[23]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[24]  J. Wang-Rodriguez,et al.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[26]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[27]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[28]  Sanjay Ranka,et al.  Classification and feature selection algorithms for multi-class CGH data , 2008, ISMB.

[29]  Jiong Yang,et al.  Integrating Heterogeneous Microarray Data Sources Using Correlation Signatures , 2005, DILS.