Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

BackgroundRNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods.ResultsDifferential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices.ConclusionsThis work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

[1]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[2]  Jay Shendure,et al.  Multiplex amplification of large sets of human exons , 2007, Nature Methods.

[3]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[4]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[5]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[6]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[7]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[8]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[9]  Jing Xu,et al.  A novel approach to detect differentially expressed genes from count-based digital databases by normalizing with housekeeping genes. , 2009, Genomics.

[10]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[11]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[12]  R. Søkilde,et al.  Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing. , 2009, RNA.

[13]  Ali Bashir,et al.  Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance , 2009, BMC Genomics.

[14]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[15]  Robert P. St.Onge,et al.  Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples , 2010, Nucleic acids research.

[16]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[17]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[18]  B. Haas,et al.  Advancing RNA-Seq analysis , 2010, Nature Biotechnology.

[19]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[20]  Hanbo Chen,et al.  VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R , 2011, BMC Bioinformatics.

[21]  A. Vogler,et al.  Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics , 2010, Nucleic acids research.

[22]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[23]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[24]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[25]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[26]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[27]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[28]  H. Steven Wiley,et al.  Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling , 2011, Bioinform..

[29]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[30]  Ying Wang,et al.  Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens , 2011, BMC Bioinformatics.

[31]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[32]  Xiangqin Cui,et al.  Design and validation issues in RNA-seq experiments , 2011, Briefings Bioinform..

[33]  Shahar Alon,et al.  Barcoding bias in high-throughput multiplex sequencing of miRNA. , 2011, Genome research.

[34]  Fred A. Wright,et al.  A powerful and flexible approach to the analysis of RNA sequence count data , 2011, Bioinform..

[35]  Jeff H. Chang,et al.  The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq , 2011 .

[36]  Stephen A Krawetz,et al.  Local and global factors affecting RNA sequencing analysis. , 2011, Analytical biochemistry.

[37]  Gene W. Yeo,et al.  RNA-seq analysis of gene expression and alternative splicing by double-random priming strategy. , 2011, Methods in molecular biology.

[38]  Yaqing Si,et al.  A Low-Cost Library Construction Protocol and Data Analysis Pipeline for Illumina-Based Strand-Specific Multiplex RNA-Seq , 2011, PloS one.

[39]  Wei Zheng,et al.  Bias detection and correction in RNA-Sequencing data , 2011, BMC Bioinformatics.

[40]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[41]  Yufeng Liu,et al.  FDM: a graph-based statistical method to detect differential transcription using RNA-seq data , 2011, Bioinform..

[42]  Simon Anders,et al.  Analysing RNA-Seq data with the DESeq package , 2011 .

[43]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[44]  Vanessa M Kvam,et al.  A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. , 2012, American journal of botany.

[45]  R. Doerge,et al.  Differential expression--the next generation and beyond. , 2012, Briefings in functional genomics.