Microarrays: how many do you need?

We estimate the number of microarrays that is required in order to gain reliable results from a common type of study: the pairwise comparison of different classes of samples. Current knowlegde seems to suffice for the construction of models that are realistic with respect to searches for individual differentially expressed genes. Such models allow to investigate the dependence of the required number of samples on the relevant parameters: the biological variability of the samples within each class; the fold changes in expression; the detection sensitivity of the microarrays; and the acceptable error rates of the results. We supply experimentalists with general conclusions as well as a freely accessible Java applet at http://cartan.gmd.de/~zien/classsize/ for fine tuning simulations to their particular actualities. Since the situation can be assumed to be very similar for large scale proteomics and metabolomics studies, our methods and results might also apply there.

[1]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[2]  日野 寛三,et al.  対数正規分布(Lognormal Distribution)のあてはめについて , 1994 .

[3]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[4]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[5]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[6]  Thomas Lengauer,et al.  Centralization: a new method for the normalization of gene expression data , 2001, ISMB.

[7]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[9]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[10]  Russell D. Wolfinger,et al.  The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster , 2001, Nature Genetics.

[11]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[12]  Bradley Efron,et al.  Microarrays empirical Bayes methods, and false discovery rates , 2001 .

[13]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[14]  R Herwig,et al.  Statistical evaluation of differential expression on cDNA nylon arrays with replicated experiments. , 2001, Nucleic acids research.

[15]  D. Lockhart,et al.  Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[17]  L. Bruhn,et al.  Tissue Classiication with Gene Expression Prooles , 2000 .

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[20]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[21]  Fred A. Wright,et al.  Theoretical and experimental comparisons of gene expression indexes for oligonucleotide arrays , 2002, Bioinform..

[22]  Gregory R. Grant,et al.  Generation of patterns from gene expression data by assigning confidence to differentially expressed genes , 2000, Bioinform..

[23]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[24]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[25]  R. O. Stuart,et al.  Changes in global gene expression patterns during development and maturation of the rat kidney , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Eric P. Hoffman,et al.  Sources of variability and effect of experimental approach on expression profiling data interpretation , 2002, BMC Bioinformatics.

[27]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[28]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[30]  Michael L. Bittner,et al.  Assessing the significance of consistently mis-regulated genes in cancer associated gene expression matrices , 2002, Bioinform..

[31]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[32]  William Stafford Noble,et al.  Analysis of strain and regional variation in gene expression in mouse brain , 2001, Genome Biology.

[33]  A. Zien,et al.  Correlated stage‐ and subfield‐associated hippocampal gene expression patterns in experimental and human temporal lobe epilepsy , 2003, The European journal of neuroscience.

[34]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[35]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[36]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[37]  J. Aitchison,et al.  The Lognormal Distribution. , 1958 .

[38]  N. Friedman,et al.  Tissue Classi cation with Gene Expression Pro les , 2004 .

[39]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .