De novo sequencing and variant calling with nanopores using PoreSeq

The accuracy of sequencing single DNA molecules with nanopores is continually improving, but de novo genome sequencing and assembly using only nanopore data remain challenging. Here we describe PoreSeq, an algorithm that identifies and corrects errors in nanopore sequencing data and improves the accuracy of de novo genome assembly with increasing coverage depth. The approach relies on modeling the possible sources of uncertainty that occur as DNA transits through the nanopore and finds the sequence that best explains multiple reads of the same region. PoreSeq increases nanopore sequencing read accuracy of M13 bacteriophage DNA from 85% to 99% at 100× coverage. We also use the algorithm to assemble Escherichia coli with 30× coverage and the λ genome at a range of coverages from 3× to 50×. Additionally, we classify sequence variants at an order of magnitude lower coverage than is possible with existing methods.

[1]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[2]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[3]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[4]  Timothy D. Harris,et al.  The challenges of sequencing by synthesis , 2009, Nature Biotechnology.

[5]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[6]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  M. Akeson,et al.  Nanopores Discriminate among Five C5-Cytosine Variants in DNA , 2014, Journal of the American Chemical Society.

[9]  Jay Shendure,et al.  Decoding long nanopore sequencing reads of natural DNA , 2014, Nature Biotechnology.

[10]  S. Howorka,et al.  Sequence-specific detection of individual DNA strands using engineered nanopores , 2001, Nature Biotechnology.

[11]  M. Niederweis,et al.  Nucleotide Discrimination with DNA Immobilized in the MspA Nanopore , 2011, PloS one.

[12]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[13]  H. Bayley,et al.  Enhanced translocation of single DNA molecules through α-hemolysin nanopores by manipulation of internal charge , 2008, Proceedings of the National Academy of Sciences.

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[16]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[17]  H. Bayley Nanopore sequencing: from imagination to reality. , 2015, Clinical chemistry.

[18]  Mick Watson,et al.  poRe: an R package for the visualization and analysis of nanopore sequencing data , 2015, Bioinform..

[19]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[20]  Erez Lieberman Aiden,et al.  The expanding scope of DNA sequencing , 2012, Nature Biotechnology.

[21]  K. Lieberman,et al.  Processive replication of single DNA molecules in a nanopore catalyzed by phi29 DNA polymerase. , 2010, Journal of the American Chemical Society.

[22]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[23]  D. Branton,et al.  Characterization of individual polynucleotide molecules using a membrane channel. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[24]  P. Ashton,et al.  MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island , 2014, Nature Biotechnology.

[25]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[26]  M. Niederweis,et al.  Reading DNA at single-nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase , 2012, Nature Biotechnology.

[27]  Mark Akeson,et al.  Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands , 2013, Proceedings of the National Academy of Sciences.

[28]  Michael Brudno,et al.  Fast and sensitive alignment of large genomic sequences , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[29]  M. Dillingham,et al.  Probing DNA Helicase Kinetics with Temperature‐Controlled Magnetic Tweezers , 2014, Small.

[30]  M. Niederweis,et al.  Single-molecule DNA detection with an engineered MspA protein nanopore , 2008, Proceedings of the National Academy of Sciences.

[31]  Richard Bellman,et al.  DYNAMIC PROGRAMMING: A BIBLIOGRAPHY OF THEORY AND APPLICATION , 1964 .

[32]  Aaron R. Quinlan,et al.  Poretools: a toolkit for analyzing nanopore sequence data , 2014, bioRxiv.

[33]  O. Gotoh Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. , 1996, Journal of molecular biology.

[34]  Aleksei Aksimentiev,et al.  DNA base-calling from a nanopore using a Viterbi algorithm. , 2012, Biophysical journal.

[35]  Aaron R Quinlan,et al.  A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer , 2014, GigaScience.