Towards Selective-Alignment: Producing Accurate And Sensitive Alignments Using Quasi-Mapping

We introduce a novel algorithm for selectively aligning high-throughput sequencing reads to a transcriptome. This algorithm attempts to bridge the gap between fast “mapping” algorithms and more traditional alignment procedures. The former of these simply provide the transcripts, loci and orientations that likely generated a read (or, perhaps, simply the transcripts that are “compatible” with a read), while the latter produces the optimal edit distance and nucleotide-level correspondence between the read and reference sequences. We adopt a hybrid approach that is able to produce accurate alignments while still retaining much of the efficiency of fast mapping algorithms. To achieve this, we make fundamental modifications to an existing mapping algorithm, quasi-mapping, which increases the sensitivity of the procedure. Additionally, unlike the strategies adopted in most aligners which first align the ends of paired-end reads independently, we introduce a notion of co-mapping. This procedure exploits relevant information between the “hits” from the left and right ends of paired-end reads before full alignments or mappings for each are generated, which improves the efficiency of filtering likely-spurious alignments. Finally, we demonstrate the utility of selective alignment in improving the accuracy of efficient transcript-level quantification from RNA-seq reads. Specifically, we show that selective-alignment is able to resolve certain complex mapping scenarios that can confound existing fast mapping procedures, while simultaneously eliminating spurious alignments that fast mapping approaches can produce. Availability and implementation: Selective-alignment is implemented in C++11 as a part of RapMap, and is available as open source software, under GPL v3, at https://github.com/COMBINE-lab/RapMap/tree/selective-alignment.

[1]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[2]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[3]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[4]  Robert Patro,et al.  RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes , 2015, bioRxiv.

[5]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[6]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[7]  Juan González-Vallinas,et al.  A new view of transcriptome complexity and regulation through the lens of local splicing variations , 2016, eLife.

[8]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[9]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[10]  Lan Lin,et al.  rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data , 2014, Proceedings of the National Academy of Sciences.

[11]  Knut Reinert,et al.  Fast and accurate read mapping with approximate seeds and multiple backtracking , 2012, Nucleic acids research.

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  Yadong Wang,et al.  deBGA: read alignment with de Bruijn graph-based seed and extension , 2016, Bioinform..

[14]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[15]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[16]  O. Kallioniemi,et al.  FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data , 2014, bioRxiv.

[17]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[18]  Richard M. Karp,et al.  Faster and More Accurate Sequence Alignment with SNAP , 2011, ArXiv.

[19]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[20]  Fatemeh Almodaresi,et al.  Improved data-driven likelihood factorizations for transcript abundance estimation , 2017, Bioinform..

[21]  A. Oshlack,et al.  JAFFA: High sensitivity transcriptome-focused fusion gene detection , 2015, Genome Medicine.

[22]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[23]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[24]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[25]  M. Axtell Butter: High-precision genomic alignment of small RNA-seq data , 2014, bioRxiv.

[26]  Lior Pachter,et al.  Pseudoalignment for metagenomic read assignment , 2015, Bioinform..

[27]  Martin Sosic,et al.  Edlib: a C/C++ library for fast, exact sequence alignment using edit distance , 2016, bioRxiv.

[28]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[29]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[30]  Knut Reinert,et al.  RazerS 3: Faster, fully sensitive read mapping , 2012, Bioinform..

[31]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.