Models for microarray gene expression data

This paper describes a general methodology for the analysis of differential gene expression based on microarray data. First, we characterize the data by a linear statistical model that accounts for relevant sources of variation in the data and then we consider estimation of the model parameters. Because microarray studies typically involve thousands of genes, we propose a two-stage method for parameter estimation. The interaction terms for genes and experimental conditions in this model capture all relevant information about differential gene expression in the microarray data. We propose a mixture distribution model for a summary statistic of differential expression that consists of null and alternative component distributions. The mixture model suggests two methods for identifying genes exhibiting differential expression. One is a frequentist method that identifies distinguished genes and the other an empirical Bayes procedure that yields estimated posterior probabilities of differential expression, conditional on observed microarray readings. An extensive case application involving juvenile cystic kidney disease in mice is used to illustrate the methodology. The application controls for variation arising from array, color channel, experimental condition (tissue type), and gene, with the analysis of variance (ANOVA) model including both main effects to normalize the expression data and all interaction terms involving genes. The gene expression profile is found to vary by tissue type as expected, but also by color channel, which was less expected. A concluding section discusses some outstanding research questions related to the analysis of microarray data.

[1]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[3]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[4]  P. Wilson,et al.  Aberrant epithelial cell growth in autosomal dominant polycystic kidney disease. , 1991, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[5]  D. E. Johnson,et al.  Analysis of Messy Data Volume I: Designed Experiments , 1985 .

[6]  Robert Tibshirani,et al.  Microarrays and Their Use in a Comparative Experiment , 2000 .

[7]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[8]  G. W. Snedecor Statistical Methods , 1964 .

[9]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[10]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[11]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[12]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[13]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .