Sample Size Calculation of RNA-sequencing Experiment-A Simulation-Based Approach of TCGA Data

Power and sample size calculation is an essential component of experimental design in biomedical research. For RNA-sequencing experiments, sample size calculations have been proposed based on mathematical models such as Poisson and negative binomial; however, RNA-seq data has exhibited variations, i.e. over-dispersion, that has caused past calculation methods to be under- or over-power. Because of this issue and the field’s lack of a simulation-based sample size calculation method for assessing differential expression analysis of RNA-seq data, we developed this method and applied it to three cancer sites from the Tumor Cancer Genome Atlas. Our results showed that each cancer site had its own unique dispersion distribution, which influenced the power and sample size calculation.

[1]  Jeff H. Chang,et al.  The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq , 2011 .

[2]  Matthew Psioda Random Effects Simulation for Sample Size Calculations Using SAS , 2012 .

[3]  Yu Shyr,et al.  Survival in BRAF V600-mutant advanced melanoma treated with vemurafenib. , 2012, The New England journal of medicine.

[4]  Shyr Yu,et al.  Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data , 2013, BMC Bioinformatics.

[5]  Qi Liu,et al.  Next generation sequencing in cancer research and clinical application , 2013, Biological Procedures Online.

[6]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[7]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[8]  Arthur X. Li,et al.  Estimating Sample Size through Simulations , 2011 .

[9]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[10]  Yan Guo,et al.  Sample size calculation for differential expression analysis of RNA-seq data under Poisson distribution , 2013, Int. J. Comput. Biol. Drug Des..

[11]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[12]  Steven N. Hart,et al.  Calculating Sample Size Estimates for RNA Sequencing Data , 2013, J. Comput. Biol..

[13]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[14]  P. Bacchetti,et al.  Sample size calculations in clinical research. , 2002, Anesthesiology.

[15]  A. W. van der Vaart,et al.  Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. , 2013, Biostatistics.

[16]  J. Lachin Introduction to sample size determination and power analysis for clinical trials. , 1981, Controlled clinical trials.

[17]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[18]  B. Wold,et al.  Sequence census methods for functional genomics , 2008, Nature Methods.

[19]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.