Gene expression data: The technology and statistical analysis

The desire to view the simultaneous behavior of genes affected by a stimulus at the total genome level has brought the scientific world to a new place in history. It is now commonplace to have an experiment that investigates the expression of thousands of genes across treatments and time points. Biologists are quickly understanding that in order to make sense of these data and the variation that is inherent in the experimental process, statistical models need to be employed. This article presents important aspects of the two most common microarray technologies, the spotted array and the oligonucleotide array, for the purpose of identifying common and unique features of each technology and the data produced. Statistical models are suggested, and the statistical literature reviewed, in an attempt to bring some level of simplicity to the daunting task of analyzing these data. We include two examples, each based upon one of the different technologies, suggesta statistical model, and present the results of the analyses in hopes of providing both encouragement and guidance to readers wanting to become more involved in this exciting field known as genomics.

[1]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[2]  Raymond J Carroll,et al.  DNA Microarray Experiments: Biological and Technological Aspects , 2002, Biometrics.

[3]  R. W. Doerge,et al.  Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments , 2002, Bioinform..

[4]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[5]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[6]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[7]  R. Nadon,et al.  Statistical issues with microarrays: processing and analysis. , 2002, Trends in genetics : TIG.

[8]  B. Weir,et al.  A systematic statistical linear modeling approach to oligonucleotide array experiments. , 2002, Mathematical biosciences.

[9]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[10]  Terence P. Speed,et al.  Comparison of Methods for Image Analysis on cDNA Microarray Data , 2002 .

[11]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[12]  J. Reecy,et al.  Differential gene expression in the rat soleus muscle during early work overload‐induced hypertrophy , 2002, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[13]  Ash A. Alizadeh,et al.  In vivo regulation of human skeletal muscle gene expression by thyroid hormone. , 2002, Genome research.

[14]  M. Black Statistical issues in the design and analysis of spotted microarray experiments , 2002 .

[15]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[16]  Russell D. Wolfinger,et al.  The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster , 2001, Nature Genetics.

[17]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[18]  Javier Cabrera,et al.  Analysis of Data From Viral DNA Microchips , 2001 .

[19]  J. Mills,et al.  A new approach for filtering noise from high-density oligonucleotide microarray datasets. , 2001, Nucleic acids research.

[20]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[21]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[22]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[23]  R. Doerge,et al.  Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments , 2001, Bioinform..

[24]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  G. Churchill,et al.  Statistical design and the analysis of gene expression microarray data. , 2001, Genetical research.

[26]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  A. Knox,et al.  Regulation of TNF‐α‐induced eotaxin release from cultured human airway smooth muscle cells by β2‐agonists and corticosteroids , 2001 .

[28]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[29]  B. Munneke Null model methods for cluster analysis of gene expression data , 2001 .

[30]  Kevin R. Coombes,et al.  Identifying Differentially Expressed Genes in cDNA Microarray Experiments , 2001, J. Comput. Biol..

[31]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[32]  C. Li,et al.  Analyzing high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry.

[33]  N. Lee,et al.  A concise guide to cDNA microarray analysis. , 2000, BioTechniques.

[34]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[36]  J. Glasner,et al.  Genome-wide expression profiling in Escherichia coli K-12. , 1999, Nucleic acids research.

[37]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.

[38]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[39]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[40]  J. Weller,et al.  A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. , 1998, Genetics.

[41]  W. Peacock,et al.  DNA METHYLATION IN PLANTS. , 1998, Annual review of plant physiology and plant molecular biology.

[42]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[44]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[45]  S. P. Fodor,et al.  Light-generated oligonucleotide arrays for rapid DNA sequence analysis. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[46]  R. Martienssen,et al.  Arabidopsis thaliana DNA methylation mutants. , 1993, Science.

[47]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[48]  Patricia J. Wozniak Applied Nonparametric Statistics (2nd ed.) , 1991 .

[49]  S. P. Fodor,et al.  Light-directed, spatially addressable parallel chemical synthesis. , 1991, Science.

[50]  A. Tamhane,et al.  Multiple Comparison Procedures. , 1989 .

[51]  K. Mullis,et al.  Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. , 1988, Science.

[52]  Frederick Mosteller,et al.  Understanding Robust and Exploratory Data Analysis. , 1983 .

[53]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[54]  W. W. Daniel Applied Nonparametric Statistics , 1978 .