Side-by-side analysis of alternative approaches on multi-level RNA-seq data

Background: RNA sequencing (RNA-seq) is widely used for RNA quantification across environmental, biological and medical sciences; it enables the description of genome-wide patterns of expression and the deduction of regulatory interactions and networks. The aim of computational analyses is to achieve an accurate output, i.e. rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite the variable levels of noise and biases present in sequencing data. The evaluation of sequencing quality and normalization are essential components of this process. Results: We investigate the discriminative power of existing approaches for the quality checking of mRNA-seq data and also propose additional, quantitative, quality checks. To accommodate the analysis of a nested, multi-level design using data on D. melanogaster, we incorporated the sample layout into the analysis. We describe a “subsampling without replacement”-based normalization and identification of DE that accounts for the experimental design i.e. the hierarchy and amplitude of effect sizes within samples. We also evaluate the differential expression call in comparison to existing approaches. To assess the broader applicability of these methods, we applied this series of steps to a published set of H. sapiens mRNA-seq samples. Conclusions: The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. Overall, the proposed approach offers the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments into the data analysis.

[1]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[2]  Martin Krzywinski,et al.  Significance, P values and t-tests , 2013, Nature Methods.

[3]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[4]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[5]  F. V. Van Dolah,et al.  Microarray validation: factors influencing correlation between oligonucleotide microarrays and real-time PCR , 2006, Biological Procedures Online.

[6]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[7]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[8]  Christian Cole,et al.  Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment , 2015, Bioinform..

[9]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[10]  Irina Mohorianu,et al.  FiRePat—Finding Regulatory Patterns between sRNAs and Genes , 2012, WIREs Data Mining Knowl. Discov..

[11]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[12]  R. Sachidanandam,et al.  Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing , 2011, Nucleic acids research.

[13]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[14]  A. McGregor,et al.  A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species , 2016, BMC Genomics.

[15]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[16]  G. Glazko,et al.  Effects of subsampling on characteristics of RNA-seq data from triple-negative breast cancer patients , 2015, Chinese journal of cancer.

[17]  G. Barton,et al.  How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? , 2015, RNA.

[18]  R. Spriggs,et al.  Evaluating bias-reducing protocols for RNA sequencing library preparation , 2014, BMC Genomics.

[19]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[20]  Susana I. L. Gomes,et al.  Variation-preserving normalization unveils blind spots in gene expression profiling , 2015, Scientific Reports.

[21]  Adam Claridge-Chang,et al.  Estimation statistics should replace significance testing , 2016, Nature Methods.

[22]  Martin Krzywinski,et al.  Points of Significance: Replication , 2014, Nature Methods.

[23]  Xiangqin Cui,et al.  Design and validation issues in RNA-seq experiments , 2011, Briefings Bioinform..

[24]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[25]  Forest Rohwer,et al.  TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets , 2010, BMC Bioinformatics.

[26]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[27]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[28]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[29]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[30]  Wei Shi,et al.  Detecting and correcting systematic variation in large-scale RNA sequencing data , 2014, Nature Biotechnology.

[31]  Matthew B. Stocks,et al.  CoLIde: a bioinformatics tool for CO-expression-based small RNA Loci Identification using high-throughput sequencing data. , 2013, RNA biology.

[32]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[33]  Faming Liang,et al.  Learning gene regulatory networks from next generation sequencing data , 2017, Biometrics.

[34]  A. Bourke,et al.  MicroRNAs Associated with Caste Determination and Differentiation in a Primitively Eusocial Insect , 2017, Scientific Reports.

[35]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[36]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[37]  Janet Kelso,et al.  PatMaN: rapid alignment of short sequences to large databases , 2008, Bioinform..

[38]  David G. Robinson,et al.  A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays , 2014, bioRxiv.

[39]  Johanna Hardin,et al.  Selecting between‐sample RNA‐Seq normalization methods from the perspective of their assumptions , 2016, Briefings Bioinform..

[40]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[41]  Ana Kozomara,et al.  Reducing ligation bias of small RNAs in libraries for next generation sequencing , 2012, Silence.

[42]  B. Oliver,et al.  Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster , 2016, BMC Genomics.

[43]  T. Chapman,et al.  Genomic responses to the socio-sexual environment in male Drosophila melanogaster exposed to conspecific rivals , 2017, RNA.

[44]  David G. Robinson,et al.  subSeq: Determining Appropriate Sequencing Depth Through Efficient Read Subsampling , 2014, Bioinform..

[45]  Keun Ho Ryu,et al.  Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data , 2015, BMC Bioinformatics.

[46]  Mark D. Robinson,et al.  Robustly detecting differential expression in RNA sequencing data using observation weights , 2013, Nucleic acids research.

[47]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[48]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[49]  Matthew B. Stocks,et al.  Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench. , 2017, RNA.

[50]  V. Moulton,et al.  Profiling of short RNAs during fleshy fruit development reveals stage-specific sRNAome expression patterns. , 2011, The Plant journal : for cell and molecular biology.

[51]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[52]  Sinnakaruppan Mathavan,et al.  Normalization of RNA-Sequencing Data from Samples with Varying mRNA Levels , 2014, PloS one.

[53]  Naomi S. Altman,et al.  Points of significance: Sources of variation , 2014, Nature Methods.

[54]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[55]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[56]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.