论文信息 - RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes - 字舞流文

RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes

Motivation: The alignment of sequencing reads to a transcriptome is a common and important step in many RNA-seq analysis tasks. When aligning RNA-seq reads directly to a transcriptome (as is common in the de novo setting or when a trusted reference annotation is available), care must be taken to report the potentially large number of multi-mapping locations per read. This can pose a substantial computational burden for existing aligners, and can considerably slow downstream analysis. Results: We introduce a novel concept, quasi-mapping, and an efficient algorithm implementing this approach for mapping sequencing reads to a transcriptome. By attempting only to report the potential loci of origin of a sequencing read, and not the base-to-base alignment by which it derives from the reference, RapMap—our tool implementing quasi-mapping—is capable of mapping sequencing reads to a target transcriptome substantially faster than existing alignment tools. The algorithm we use to implement quasi-mapping uses several efficient data structures and takes advantage of the special structure of shared sequence prevalent in transcriptomes to rapidly provide highly-accurate mapping information. We demonstrate how quasi-mapping can be successfully applied to the problems of transcript-level quantification from RNA-seq reads and the clustering of contigs from de novo assembled transcriptomes into biologically meaningful groups. Availability and implementation: RapMap is implemented in C ++11 and is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/RapMap. Contact: rob.patro@cs.stonybrook.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Robert Patro | Nitish Gupta | Avi Srivastava | Hirak Sarkar | Robert Patro | Hirak Sarkar | Avi Srivastava | Nitish Gupta

[1] Zhengwei Zhu,et al. CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[2] Lior Pachter,et al. Near-optimal RNA-Seq quantification , 2015, ArXiv.

[3] Rob Patro,et al. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[4] A. Kristjuhan,et al. Elongator interactions with nascent mRNA revealed by RNA immunoprecipitation. , 2004, Molecular cell.

[5] Sven Rahmann,et al. Building and Documenting Workflows with Python-Based Snakemake , 2012, GCB.

[6] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[7] Wei Wang,et al. RNA-Skim: a rapid method for RNA-Seq quantification at transcript level , 2014, Bioinform..

[8] Alessandro Vullo,et al. Ensembl 2015 , 2014, Nucleic Acids Res..

[9] Kevin S. Smith,et al. High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing , 2014, PloS one.

[10] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[11] Thomas R. Gingeras,et al. STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[12] Richard Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13] Masao Nagasaki,et al. TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads , 2014, BMC Genomics.

[14] I. Nookaew,et al. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae , 2012, Nucleic acids research.

[15] A. Oshlack,et al. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes , 2014, Genome Biology.

[16] Geet Duggal,et al. Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment , 2015 .

[17] Steven L Salzberg,et al. Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[18] Steven L Salzberg,et al. HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[19] R. Durbin,et al. Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

[20] Ion I. Mandoiu,et al. Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[21] David G Hendrickson,et al. Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[22] W. Shi,et al. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[23] Eugene W. Myers,et al. Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[24] Heng Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[25] S. Dongen. A cluster algorithm for graphs , 2000 .

[26] Colin N. Dewey,et al. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[27] N. Friedman,et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[28] Masao Nagasaki,et al. TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference , 2013, Bioinform..

[29] Robert Patro,et al. Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[30] R. Guigó,et al. Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[31] Lior Pachter,et al. Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[32] Sebastiano Vigna,et al. Broadword Implementation of Rank/Select Queries , 2008, WEA.

[33] Carl Kingsford,et al. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[34] W. J. Kent,et al. BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[35] Pedro G. Ferreira,et al. Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[36] Colin N. Dewey,et al. RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[37] Lucian Ilie,et al. The longest common extension problem revisited and applications to approximate string searching , 2010, J. Discrete Algorithms.

[38] Faraz Hach,et al. mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[39] Jürg Bähler,et al. Proportionality: A Valid Alternative to Correlation for Relative Data , 2014, bioRxiv.