A Paradigm for Class Prediction Using Gene Expression Profiles

We propose a general framework for prediction of predefined tumor classes using gene expression profiles from microarray experiments. The framework consists of 1) evaluating the appropriateness of class prediction for the given data set, 2) selecting the prediction method, 3) performing cross-validated class prediction, and 4) assessing the significance of prediction results by permutation testing. We describe an application of the prediction paradigm to gene expression profiles from human breast cancers, with specimens classified as positive or negative for BRCA1 mutations and also for BRCA2 mutations. In both cases, the accuracy of class prediction was statistically significant when compared to the accuracy of prediction expected by chance. The framework proposed here for the application of class prediction is designed to reduce the occurrence of spurious findings, a legitimate concern for high-dimensional microarray data. The prediction paradigm will serve as a good framework for comparing different prediction methods and may accelerate the development of molecular classifiers that are clinically useful.

[1]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[5]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[6]  M. Saraste,et al.  FEBS Lett , 2000 .

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  A. Brazma,et al.  Gene expression data analysis , 2000, FEBS letters.

[9]  N. Friedman,et al.  Tissue Classi cation with Gene Expression Pro les , 2004 .

[10]  S. Dudoit,et al.  Comparison of discrimination methods for the classification of tumors using gene expression data , 2002 .

[11]  D. F. Morrison,et al.  Multivariate Statistical Methods , 1968 .

[12]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[13]  Syed Mohsin,et al.  Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer , 2003, The Lancet.

[14]  J. Tukey Tightening the clinical trial. , 1993, Controlled clinical trials.

[15]  Richard Simon,et al.  Using DNA microarrays for diagnostic and prognostic prediction , 2003, Expert review of molecular diagnostics.

[16]  M. Hills Allocation Rules and Their Error Rates , 1966 .

[17]  S. Baker The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. , 2005, Journal of the National Cancer Institute.

[18]  John T Ellis,et al.  The design and analysis of microarray experiments: applications in parasitology. , 2003, DNA and cell biology.

[19]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[20]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[22]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[23]  L. Staudt,et al.  The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. , 2003, Cancer cell.

[24]  Adrian Wiestner,et al.  A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Robert Tibshirani,et al.  An Introduction to the Bootstrap CHAPMAN & HALL/CRC , 1993 .

[26]  Charles Stein,et al.  On the Theory of Some Non-Parametric Hypotheses , 1949 .

[27]  A. Brazma,et al.  Gene expression data analysis. , 2001, FEBS letters.

[28]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[29]  Philip M. Long,et al.  Optimal gene expression analysis by microarrays. , 2002, Cancer cell.