A mixture model approach to sample size estimation in two-sample comparative microarray experiments

BackgroundChoosing the appropriate sample size is an important step in the design of a microarray experiment, and recently methods have been proposed that estimate sample sizes for control of the False Discovery Rate (FDR). Many of these methods require knowledge of the distribution of effect sizes among the differentially expressed genes. If this distribution can be determined then accurate sample size requirements can be calculated.ResultsWe present a mixture model approach to estimating the distribution of effect sizes in data from two-sample comparative studies. Specifically, we present a novel, closed form, algorithm for estimating the noncentrality parameters in the test statistic distributions of differentially expressed genes. We then show how our model can be used to estimate sample sizes that control the FDR together with other statistical measures like average power or the false nondiscovery rate. Method performance is evaluated through a comparison with existing methods for sample size estimation, and is found to be very good.ConclusionA novel method for estimating the appropriate sample size for a two-sample comparative microarray study is presented. The method is shown to perform very well when compared to existing methods.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[3]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[4]  Karuturi R. Krishna Murthy,et al.  Bias in the estimation of false discovery rate in microarray studies , 2005, Bioinform..

[5]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[8]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[9]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Shuying S Li,et al.  FDR‐controlling testing procedures and sample size determination for microarrays , 2005, Statistics in medicine.

[12]  Peng Liu,et al.  Quick calculation for sample size while controlling false discovery rate with application to microarray analysis , 2007, Bioinform..

[13]  Sin-Ho Jung,et al.  Sample size for FDR-control in microarray data analysis , 2005, Bioinform..

[14]  L Howarth,et al.  Methods of Mathematical Physics (Third Edition) , 1956 .

[15]  Philip M. Morse,et al.  Methods of Mathematical Physics , 1947, The Mathematical Gazette.

[16]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[17]  Robert Tibshirani,et al.  A simple method for assessing sample sizes in microarray experiments , 2006, BMC Bioinformatics.

[18]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[19]  Fred A. Wright,et al.  Practical FDR-based sample size calculations in microarray experiments , 2005, Bioinform..

[20]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[21]  Tommy S. Jørstad,et al.  Understanding sample size: what determines the required number of microarrays for an experiment? , 2007, Trends in plant science.

[22]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[23]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[24]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[25]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[26]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[27]  S. Dudoit,et al.  Resampling-based multiple testing for microarray data analysis , 2003 .

[28]  John D. Storey A direct approach to false discovery rates , 2002 .

[29]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[30]  N. Higham Computing the nearest correlation matrix—a problem from finance , 2002 .

[31]  J. A. Ferreira,et al.  Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method , 2006 .

[32]  Cheng Cheng,et al.  Sample size determination for the false discovery rate , 2005, Bioinform..

[33]  David B. Allison,et al.  Power and sample size estimation in high dimensional biology , 2004 .

[34]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .