Power analysis for RNA-Seq differential expression studies

BackgroundSample size calculation and power estimation are essential components of experimental designs in biomedical research. It is very challenging to estimate power for RNA-Seq differential expression under complex experimental designs. Moreover, the dependency among genes should be taken into account in order to obtain accurate results.ResultsIn this paper, we propose a simulation based procedure for power estimation using the negative binomial distribution and assuming a generalized linear model (at the gene level) that considers the dependence between gene expression level and its variance (dispersion) and also allows equal or unequal dispersion across conditions. We compared the performance of both Wald test and likelihood ratio test under different scenarios. The null distribution of the test statistics was simulated for the desired false positive control to avoid excess false positives with the usage of an asymptotic chi-square distribution. We applied this method to the TCGA breast cancer data set.ConclusionsWe provide a framework for power estimation of RNA-Seq data. The proposed procedure is able to properly control the false positive error rate at the nominal level.

[1]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[2]  Qi Liu,et al.  Next generation sequencing in cancer research and clinical application , 2013, Biological Procedures Online.

[3]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[4]  Hao Wu,et al.  PROPER: comprehensive power evaluation for differential expression using RNA-seq , 2015, Bioinform..

[5]  David M. Rocke,et al.  Controlling False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data , 2015, bioRxiv.

[6]  Steven J. M. Jones,et al.  Alternative expression analysis by RNA sequencing , 2010, Nature Methods.

[7]  J. Shendure The beginning of the end for microarrays? , 2008, Nature Methods.

[8]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[9]  Lana X Garmire,et al.  Power analysis and sample size estimation for RNA-Seq differential expression , 2014, RNA.

[10]  Alexander Gordon,et al.  Control of the mean number of false discoveries, Bonferroni and stability of multiple testing , 2007, 0709.0366.

[11]  Xiangqin Cui,et al.  Design and validation issues in RNA-seq experiments , 2011, Briefings Bioinform..

[12]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[13]  Pablo D. Reeb,et al.  Evaluating statistical analysis models for RNA sequencing experiments , 2013, Front. Genet..

[14]  David M. Rocke,et al.  Excess False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data , 2015, bioRxiv.

[15]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[16]  Shyr Yu,et al.  Sample size calculation based on exact test for assessing differential expression analysis in RNA-seq data , 2013, BMC Bioinformatics.

[17]  Steven P Lund,et al.  Statistical Applications in Genetics and Molecular Biology Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates , 2012 .

[18]  H. Nagaraja,et al.  Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials , 2012, Journal of biopharmaceutical statistics.

[19]  Yan Guo,et al.  RNAseqPS: A Web Tool for Estimating Sample Size and Power for RNAseq Experiment , 2014, Cancer informatics.

[20]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.