mapAlign: An Efficient Approach for Mapping and Aligning Long Reads to Reference Genomes

Long reads play an important role for the identification of structural variants, sequencing repetitive regions, phasing of alleles, etc. In this paper, we propose a new approach for mapping long reads to reference genomes. We also propose a new method to generate accurate alignments of the long reads and the corresponding segments of reference genome. The new mapping algorithm is based on the longest common sub-sequence with distance constraints. The new (local) alignment algorithms is based on the idea of recursive alignment of variable size k-mers. Experiments show that our new method can generate better alignments in terms of both identity and alignment scores for both Nanopore and SMRT data sets. In particular, our method can align 91.53% and \(85.36\%\) of letters on reads to identical letters on reference genomes for human individuals of Nanopore and SMRT data sets, respectively. The state-of-the-art method can only align \(88.44\%\) and \(79.08\%\) letters of reads for Nanopore and SMRT data sets, respectively. Our method is also faster than the state-of-the-art method.

[1]  Yongchao Liu,et al.  CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform , 2012, Bioinform..

[2]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[3]  Giorgio Valle,et al.  PASS: a program to align short sequences , 2009, Bioinform..

[4]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[5]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[6]  Vitaly L. Galinsky YOABS: yet other aligner of biological sequences - an efficient linearly scaling nucleotide aligner , 2012, Bioinform..

[7]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[8]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[9]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[10]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[11]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[12]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[13]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[14]  K. Reinert,et al.  RazerS--fast read mapping with sensitivity control. , 2009, Genome research.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[17]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[18]  Kiyoshi Asai,et al.  PBSIM: PacBio reads simulator - toward accurate genome assembly , 2013, Bioinform..

[19]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[20]  Chirag Jain,et al.  A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases , 2017, RECOMB.

[21]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[22]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[23]  Francisco M. De La Vega,et al.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. , 2009, Genome research.

[24]  Niranjan Nagarajan,et al.  Fast and sensitive mapping of nanopore sequencing reads with GraphMap , 2016, Nature Communications.

[25]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[26]  Heng Li,et al.  Minimap2: fast pairwise alignment for long nucleotide sequences , 2017 .

[27]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[28]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..