Fast and sensitive mapping of nanopore sequencing reads with GraphMap

Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10–80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

[1]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[2]  Mile Šikić,et al.  $LCSk$++: Practical similarity metric for long strings , 2014, ArXiv.

[3]  Vineet Bafna,et al.  Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations , 2014, Genome research.

[4]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[5]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[6]  S. Salzberg,et al.  Using MUMmer to Identify Similar Regions in Large Sequence Sets , 2004 .

[7]  Minh Duc Cao,et al.  Real-time strain typing and analysis of antibiotic resistance potential using Nanopore MinION sequencing , 2015 .

[8]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[9]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[10]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[11]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[12]  Tamas Szalay,et al.  De novo sequencing and variant calling with nanopores using PoreSeq , 2015, Nature Biotechnology.

[13]  Mick Watson,et al.  A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data , 2015, GigaScience.

[14]  Eugene W. Myers,et al.  Efficient Local Alignment Discovery amongst Noisy Long Reads , 2014, WABI.

[15]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[16]  Aaron R Quinlan,et al.  A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer , 2014, GigaScience.

[17]  Alexander S. Mikheyev,et al.  A first look at the Oxford Nanopore MinION sequencer , 2014, Molecular ecology resources.

[18]  Gary Benson,et al.  Longest Common Subsequence in k Length Substrings , 2013, SISAP.

[19]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[20]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[21]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[22]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[23]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[24]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[25]  Kiyoshi Asai,et al.  PBSIM: PacBio reads simulator - toward accurate genome assembly , 2013, Bioinform..

[26]  P. Ashton,et al.  MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island , 2014, Nature Biotechnology.

[27]  Juha Kärkkäinen,et al.  One-Gapped q-Gram Filtersfor Levenshtein Distance , 2002, CPM.

[28]  Stefan Engelen,et al.  Genome assembly using Nanopore-guided long and error-free DNA reads , 2015, BMC Genomics.

[29]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[30]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[31]  David A. Eccles,et al.  MinION Analysis and Reference Consortium: Phase 1 data release and analysis , 2015, F1000Research.

[32]  Yue Wang,et al.  The evolution of nanopore sequencing , 2014, Front. Genet..

[33]  Bin Ma,et al.  Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[34]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[35]  Gary D Bader,et al.  Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes , 2015, F1000Research.