Salmon provides fast and bias-aware quantification of transcript expression

We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA–seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.

[1]  Di Tommaso Paolo,et al.  A novel tool for highly scalable computational pipelines , 2014 .

[2]  R. Irizarry,et al.  Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation , 2015, Nature Biotechnology.

[3]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[4]  Robert Patro,et al.  RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes , 2015, bioRxiv.

[5]  Steven E Brenner,et al.  Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data , 2014, Genome research.

[6]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[7]  L. Coin,et al.  Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads , 2011, Genome Biology.

[8]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[9]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[10]  Masao Nagasaki,et al.  TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads , 2014, BMC Genomics.

[11]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[12]  Antti Honkela,et al.  Fast and accurate approximate inference of transcript expression from RNA-seq data , 2014, Bioinform..

[13]  M. McCarthy,et al.  Human β cell transcriptome analysis uncovers lncRNAs that are tissue-specific, dynamically regulated, and abnormally expressed in type 2 diabetes. , 2012, Cell metabolism.

[14]  Jeroen F. J. Laros,et al.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories , 2013, Nature Biotechnology.

[15]  Casey S. Greene,et al.  Reproducible Computational Workflows with Continuous Analysis , 2016 .

[16]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[17]  Ion I. Mandoiu,et al.  Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[18]  Robert E Kass,et al.  Statistical Inference: The Big Picture. , 2011, Statistical science : a review journal of the Institute of Mathematical Statistics.

[19]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[20]  L. Pachter,et al.  Streaming fragment assignment for real-time analysis of sequencing experiments , 2012, Nature Methods.

[21]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[22]  Hui Jiang,et al.  Statistical Modeling of RNA-Seq Data. , 2011, Statistical science : a review journal of the Institute of Mathematical Statistics.

[23]  Dmitri D. Pervouchine,et al.  A benchmark for RNA-seq quantification pipelines , 2016, Genome Biology.

[24]  Alyssa C. Frazee,et al.  Polyester: Simulating RNA-Seq Datasets With Differential Transcript Expression , 2014, bioRxiv.

[25]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[26]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.