Graphical Methods for Class Prediction Using Dimension Reduction Techniques on DNA Microarray Data

MOTIVATION We introduce simple graphical classification and prediction tools for tumor status using gene-expression profiles. They are based on two dimension estimation techniques sliced average variance estimation (SAVE) and sliced inverse regression (SIR). Both SAVE and SIR are used to infer on the dimension of the classification problem and obtain linear combinations of genes that contain sufficient information to predict class membership, such as tumor type. Plots of the estimated directions as well as numerical thresholds estimated from the plots are used to predict tumor classes in cDNA microarrays and the performance of the class predictors is assessed by cross-validation. A microarray simulation study is carried out to compare the power and predictive accuracy of the two methods. RESULTS The methods are applied to cDNA microarray data on BRCA1 and BRCA2 mutation carriers as well as sporadic tumors from Hedenfalk et al. (2001). All samples are correctly classified.

[1]  Benzion Boukai,et al.  The Discrimination Subspace Model , 1997 .

[2]  R. Cook Graphics for regressions with a binary response , 1996 .

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[5]  R. Cook Regression Graphics , 1994 .

[6]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[7]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[8]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[9]  R. Cook,et al.  Estimating the structural dimension of regressions via parametric inverse regression , 2001 .

[10]  R. Cook,et al.  Extending Sliced Inverse Regression , 2001 .

[11]  R. Cook,et al.  Identifying Regression Outliers and Mixtures Graphically , 2000 .

[12]  F. Chiaromonte,et al.  Dimension reduction strategies for analyzing global gene expression data with a response. , 2002, Mathematical biosciences.

[13]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[14]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[15]  R. Cook,et al.  Dimension Reduction in Binary Response Regression , 1999 .

[16]  Mark Schena,et al.  DNA microarrays : a practical approach , 1999 .

[17]  T. Kepler,et al.  Normalization and analysis of DNA microarray data by self-consistency and local regression , 2002, Genome Biology.

[18]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[19]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[20]  R. Cook,et al.  Reweighting to Achieve Elliptically Contoured Covariates in Regression , 1994 .

[21]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[22]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.