Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package

As the use of RNA-seq has popularized, there is an increasing consciousness of the importance of experimental design, bias removal, accurate quantification and control of false positives for proper data analysis. We introduce the NOISeq R-package for quality control and analysis of count data. We show how the available diagnostic tools can be used to monitor quality issues, make pre-processing decisions and improve analysis. We demonstrate that the non-parametric NOISeqBIO efficiently controls false discoveries in experiments with biological replication and outperforms state-of-the-art methods. NOISeq is a comprehensive resource that meets current needs for robust data-aware analysis of RNA-seq differential expression.

[1]  J. E. Puhalla Compatibility reactions on solid medium and interstrain inhibition in Ustilago maydis. , 1968, Genetics.

[2]  W. David Kelton,et al.  Statistical design and analysis , 1986, WSC '86.

[3]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[4]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[5]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[6]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[7]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[8]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[9]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[10]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[11]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[12]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[13]  Ali Bashir,et al.  Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance , 2009, BMC Genomics.

[14]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[15]  Mathieu Blanchette,et al.  Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human , 2010, PLoS Comput. Biol..

[16]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[17]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[18]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[19]  B. Oliver,et al.  Microarrays, deep sequencing and the true measure of the transcriptome , 2011, BMC Biology.

[20]  H. Steven Wiley,et al.  Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling , 2011, Bioinform..

[21]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[22]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[23]  Wei Zheng,et al.  Bias detection and correction in RNA-Sequencing data , 2011, BMC Bioinformatics.

[24]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[25]  J. Calvete,et al.  Integrated “omics” profiling indicates that miRNAs are modulators of the ontogenetic venom composition shift in the Central American rattlesnake, Crotalus simus simus , 2013, BMC Genomics.

[26]  Ana Conesa,et al.  ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. , 2012, Biostatistics.

[27]  Joaquín Dopazo,et al.  Qualimap: evaluating next-generation sequencing alignment data , 2012, Bioinform..

[28]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[29]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[30]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[31]  Adelailson Peixoto,et al.  Computer-assisted coloring and illuminating based on a region-tree structure , 2012, SpringerPlus.

[32]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[33]  Hua Li,et al.  Accuracy of RNA-Seq and its dependence on sequencing depth , 2012, BMC Bioinformatics.

[34]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[35]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[36]  J. Guarro,et al.  HapX-Mediated Iron Homeostasis Is Essential for Rhizosphere Competence and Virulence of the Soilborne Pathogen Fusarium oxysporum[C][W][OA] , 2012, Plant Cell.

[37]  RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings , 2012, Cell Research.

[38]  Gautier Koscielny,et al.  Ensembl 2012 , 2011, Nucleic Acids Res..

[39]  Susan R. Wilson,et al.  Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing , 2012, BMC Genomics.

[40]  I. Nookaew,et al.  A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae , 2012, Nucleic acids research.

[41]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[42]  A. Conesa,et al.  Transdifferentiation of MALME-3M and MCF-7 Cells toward Adipocyte-like Cells is Dependent on Clathrin-mediated Endocytosis , 2012, SpringerPlus.

[43]  Toni Gabaldón,et al.  Transcriptome analyses of primitively eusocial wasps reveal novel insights into the evolution of sociality and the origin of alternative phenotypes , 2013, Genome Biology.

[44]  S. Hochreiter,et al.  DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions , 2013, Nucleic Acids Research.

[45]  Davis J. McCarthy,et al.  Count-based differential expression analysis of RNA sequencing data using R and Bioconductor , 2013, Nature Protocols.

[46]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[47]  Ramana V. Davuluri,et al.  NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[48]  Hsuan-Cheng Huang,et al.  Anatomical and transcriptional dynamics of maize embryonic leaves during seed germination , 2013, Proceedings of the National Academy of Sciences.

[49]  Gabor T. Marth,et al.  Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression , 2013, Bioinform..

[50]  L. Rieseberg,et al.  RNA-Seq Analysis of Allele-Specific Expression, Hybrid Effects, and Regulatory Divergence in Hybrids Compared with Their Parents from Natural Populations , 2013, Genome biology and evolution.

[51]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[52]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[53]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[54]  C. Helliwell,et al.  Characterization of the defense transcriptome responsive to Fusarium oxysporum-infection in Arabidopsis using RNA-seq. , 2013, Gene.

[55]  Tieliu Shi,et al.  Dissecting the Characteristics and Dynamics of Human Protein Complexes at Transcriptome Cascade Using RNA-Seq Data , 2013, PloS one.

[56]  Somvong Tragoonrung,et al.  Transcriptome analysis of normal and mantled developing oil palm flower and fruit. , 2013, Genomics.

[57]  P. Liu,et al.  Analysis of Stress-Responsive Transcriptome in the Intestine of Asian Seabass (Lates calcarifer) using RNA-Seq , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.

[58]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[59]  Jie Zhou,et al.  RNA-seq differential expression studies: more sequence or more replication? , 2014, Bioinform..

[60]  Claudia Angelini,et al.  RNASeqGUI: a GUI for analysing RNA-Seq data , 2014, Bioinform..

[61]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[62]  Wei Shi,et al.  Detecting and correcting systematic variation in large-scale RNA sequencing data , 2014, Nature Biotechnology.

[63]  Li-Feng Zhang,et al.  LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data , 2014, BMC Genomics.

[64]  Ruifu Yang,et al.  Phenotypic, genomic, transcriptomic and proteomic changes in Bacillus cereus after a short-term space flight , 2014 .

[65]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.