Deriving quantitative conclusions from microarray expression data

MOTIVATION The last few years have seen the development of DNA microarray technology that allows simultaneous measurement of the expression levels of thousands of genes. While many methods have been developed to analyze such data, most have been visualization-based. Methods that yield quantitative conclusions have been diverse and complex. RESULTS We present two straightforward methods for identifying specific genes whose expression is linked with a phenotype or outcome variable as well as for systematically predicting sample class membership: (1) a conservative, permutation-based approach to identifying differentially expressed genes; (2) an augmentation of K-nearest-neighbor pattern classification. Our analyses replicate the quantitative conclusions of Golub et al. (1999; Science, 286, 531-537) on leukemia data, with better classification results, using far simpler methods. With the breast tumor data of Perou et al. (2000; Nature, 406, 747-752), the methods lend rigorous quantitative support to the conclusions of the original paper. In the case of the lymphoma data in Alizadeh et al. (2000; Nature, 403, 503-511), our analyses only partially support the conclusions of the original authors. AVAILABILITY The software and supplementary information are available freely to researchers at academic and non-profit institutions at http://cc.ucsf.edu/jain/public

[1]  O. Monni,et al.  New amplified and highly expressed genes discovered in the ERBB2 amplicon in breast cancer by cDNA microarrays. , 2001, Cancer research.

[2]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[3]  Joe W. Gray,et al.  Quantitative analysis of chromosomal CGH in human breast tumors associates copy number abnormalities with p53 status and patient survival , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[7]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[8]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[9]  P. W. Janes,et al.  Structural Determinants of the Interaction between the erbB2 Receptor and the Src Homology 2 Domain of Grb7* , 1997, The Journal of Biological Chemistry.

[10]  R. Tibshirani,et al.  Supervised harvesting of expression trees , 2001, Genome Biology.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Ajay N. Jain,et al.  Array-based comparative genomic hybridization for the differential diagnosis of renal cell cancer. , 2002, Cancer research.

[13]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[15]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[17]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  Robert Tibshirani,et al.  Statistical methods for identifying differentially expressed genes in DNA microarrays. , 2003, Methods in molecular biology.

[20]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[21]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[22]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[23]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  I. Mian,et al.  Analysis of molecular profile data using generative and discriminative methods. , 2000, Physiological genomics.

[25]  David Botstein,et al.  Probing Lymphocyte Biology by Genomic-Scale Gene Expression Analysis , 1998, Journal of Clinical Immunology.