Robust two-gene classifiers for cancer prediction.

Two-gene classifiers have attracted a broad interest for their simplicity and practicality. Most existing two-gene classification algorithms were involved in exhaustive search that led to their low time-efficiencies. In this study, we proposed two new two-gene classification algorithms which used simple univariate gene selection strategy and constructed simple classification rules based on optimal cut-points for two genes selected. We detected the optimal cut-point with the information entropy principle. We applied the two-gene classification models to eleven cancer gene expression datasets and compared their classification performance to that of some established two-gene classification models like the top-scoring pairs model and the greedy pairs model, as well as standard methods including Diagonal Linear Discriminant Analysis, k-Nearest Neighbor, Support Vector Machine and Random Forest. These comparisons indicated that the performance of our two-gene classifiers was comparable to or better than that of compared models.

[1]  Stuart G. Baker,et al.  Simple and flexible classification of gene expression microarrays via Swirls and Ripples , 2010, BMC Bioinformatics.

[2]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[3]  Huiqing Liu,et al.  Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[4]  S. Knudsen,et al.  Evaluation of microRNA expression profiles that may predict recurrence of localized stage I non-small cell lung cancer after surgical resection. , 2010, Cancer research.

[5]  Yixin Wang,et al.  Novel Genes Associated with Malignant Melanoma but not Benign Melanocytic Lesions , 2005, Clinical Cancer Research.

[6]  Wei Wang,et al.  A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. , 2004, Cancer cell.

[7]  F. Zhan,et al.  The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. , 2003, The New England journal of medicine.

[8]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Daniel Q. Naiman,et al.  Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data , 2005, Bioinform..

[10]  Richard Simon,et al.  Microarray-based cancer prediction using single genes , 2011, BMC Bioinformatics.

[11]  Sandrine Dudoit,et al.  Classification in microarray experiments , 2003 .

[12]  Marcel J. T. Reinders,et al.  A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets , 2006, BMC Bioinformatics.

[13]  D. Botstein,et al.  Variation in gene expression patterns in human gastric cancers. , 2003, Molecular biology of the cell.

[14]  Yingdong Zhao,et al.  BRB-ArrayTools Data Archive for Human Cancer Gene Expression: A Unique and Efficient Data Sharing Resource , 2008, Cancer informatics.

[15]  Yingdong Zhao,et al.  Analysis of Gene Expression Data Using BRB-Array Tools , 2007, Cancer informatics.

[16]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[17]  Daniel Q. Naiman,et al.  Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2004, Statistical applications in genetics and molecular biology.

[18]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[19]  T. H. Bø,et al.  New feature subset selection procedures for classification of expression profiles , 2002, Genome Biology.

[20]  Hiroyuki Mano,et al.  Experimental trial for diagnosis of pancreatic ductal carcinoma based on gene expression profiles of pancreatic ductal cells , 2005, Cancer science.

[21]  Xiaosheng Wang,et al.  Accurate molecular classification of cancer using simple rules , 2009, BMC Medical Genomics.

[22]  I. Gorlov,et al.  Usefulness of the top-scoring pairs of genes for prediction of prostate cancer progression , 2010, Prostate Cancer and Prostatic Diseases.

[23]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Bob Löwenberg,et al.  A 2-gene classifier for predicting response to the farnesyltransferase inhibitor tipifarnib in acute myeloid leukemia. , 2007, Blood.

[25]  Richard Simon,et al.  Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n) , 2003, SKDD.

[26]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[27]  L. Hood,et al.  Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas , 2007, Proceedings of the National Academy of Sciences.

[28]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[29]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[30]  Cor J. Veenman,et al.  A protocol for building and evaluating predictors of disease state based on microarray data , 2005, Bioinform..

[31]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[32]  Donald Geman,et al.  Two-transcript gene expression classifiers in the diagnosis and prognosis of human diseases , 2009, BMC Genomics.

[33]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.