Reproducibility, sources of variability, pooling, and sample size: important considerations for the design of high-density oligonucleotide array experiments.

We have undertaken a series of experiments to examine several issues that directly affect design of gene expression studies using Affymetrix GeneChip arrays: probe-level analysis, need for technical replication, relative contribution of various sources of variability, and utility of pooling RNA from different samples. Probe-level data were analyzed by Affymetrix MAS 5.0, and three model-based methods, PM-MM and PM-only models by dChip, and the RMA model by Bioconductor, with the latter two providing the best performance. We found that replicate chips of the same RNA have limited value in reducing total variability, and for relatively highly expressed genes in this biologically homogeneous animal model of aging, about 11% of total variation is due to day effects and the remainder is approximately equally split between sample and residual sources. We also found that pooling samples is neither advantageous nor detrimental. Finally we suggest a strategy for sample size calculations using formulas appropriate when coefficients of variation are known, target effects are expressed as fold changes, and data can be assumed to be approximately lognormally distributed.

[1]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[2]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[3]  A Orimo,et al.  Molecular cloning of ring finger protein 21 (RNF21)/interferon-responsive finger protein (ifp1), which possesses two RING-B box-coiled coil domains in tandem. , 2000, Genomics.

[4]  Daniel R Weinberger,et al.  Microarray analysis of gene expression in the prefrontal cortex in schizophrenia: a preliminary study , 2002, Schizophrenia Research.

[5]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[6]  Kristina Hanspers,et al.  Spotted long oligonucleotide arrays for human gene expression analysis. , 2003, Genome research.

[7]  S. Granjeaud,et al.  Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones. , 1995, Genomics.

[8]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[9]  John D. Storey,et al.  Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  L. Wodicka,et al.  Genome-wide expression monitoring in Saccharomyces cerevisiae , 1997, Nature Biotechnology.

[11]  J. Warrington,et al.  Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. , 2000, Physiological genomics.

[12]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[13]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[14]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[15]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[16]  Gerald van Belle,et al.  Sample Size as a Function of Coefficient of Variation and Ratio of Means , 1993 .

[17]  K. Aldape,et al.  A model of molecular interactions on short oligonucleotide microarrays , 2003, Nature Biotechnology.

[18]  E. Brown,et al.  Global transcription profiling of estrogen activity: estrogen receptor alpha regulates gene expression in the kidney. , 2003, Endocrinology.

[19]  C. Kendziorski,et al.  The efficiency of pooling mRNA in microarray experiments. , 2003, Biostatistics.

[20]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[21]  J B Carlin,et al.  Sample-size calculation for a log-transformed outcome measure. , 1999, Controlled clinical trials.

[22]  Syed Mohsin,et al.  Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer , 2003, The Lancet.

[23]  Jialu Zhang,et al.  Changes in human bladder epithelial cell gene expression associated with interstitial cystitis or antiproliferative factor treatment. , 2003, Physiological genomics.

[24]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. Stec,et al.  Gene expression profiles obtained from fine-needle aspirations of breast cancer reliably identify routine prognostic markers and reveal large-scale molecular differences between estrogen-negative and estrogen-positive tumors. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[26]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  D. Stone,et al.  Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[30]  J. Sambrook,et al.  Molecular Cloning: A Laboratory Manual , 2001 .

[31]  Mirana Ramialison,et al.  Expression profiling in mouse fetal thymus reveals clusters of coordinately expressed genes that mark individual stages of T-cell ontogeny , 2002, Immunogenetics.

[32]  C. Auffray,et al.  Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA array. , 1996, Genome research.

[33]  J. Cleary,et al.  Genome-wide Expression Profiling of the Response to Polyene, Pyrimidine, Azole, and Echinocandin Antifungal Agents in Saccharomyces cerevisiae* , 2003, Journal of Biological Chemistry.

[34]  Arnold J. Stromberg,et al.  Statistical implications of pooling RNA samples for microarray experiments , 2003, BMC Bioinform..