Scaffolding and completing genome assemblies in real-time with nanopore sequencing

Third generation sequencing technologies provide the opportunity to improve genome assemblies by generating long reads spanning most repeat sequences. However, current analysis methods require substantial amounts of sequence data and computational resources to overcome the high error rates. Furthermore, they can only perform analysis after sequencing has completed, resulting in either over-sequencing, or in a low quality assembly due to under-sequencing. Here we present npScarf, which can scaffold and complete short read assemblies while the long read sequencing run is in progress. It reports assembly metrics in real-time so the sequencing run can be terminated once an assembly of sufficient quality is obtained. In assembling four bacterial and one eukaryotic genomes, we show that npScarf can construct more complete and accurate assemblies while requiring less sequencing data and computational resources than existing methods. Our approach offers a time- and resource-effective strategy for completing short read assemblies.

[1]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[2]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[3]  M. Forsman,et al.  Scaffolding of a bacterial genome using MinION nanopore sequencing , 2015, Scientific Reports.

[4]  D. Branton,et al.  The potential and challenges of nanopore sequencing , 2008, Nature Biotechnology.

[5]  Vineet Bafna,et al.  Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads , 2013, WABI.

[6]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[7]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[8]  David S. Wishart,et al.  PHAST: A Fast Phage Search Tool , 2011, Nucleic Acids Res..

[9]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[10]  S. Koren,et al.  One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. , 2015, Current opinion in microbiology.

[11]  Benedict Paten,et al.  Improved data analysis for the MinION nanopore sequencer , 2015, Nature Methods.

[12]  Steven J. M. Jones,et al.  LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads , 2015, GigaScience.

[13]  Ole Lund,et al.  In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing , 2014, Antimicrobial Agents and Chemotherapy.

[14]  Kelly P. Williams,et al.  Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities , 2004, Nucleic Acids Res..

[15]  Fiona S. L. Brinkman,et al.  Detecting genomic islands using bioinformatics approaches , 2010, Nature Reviews Microbiology.

[16]  Lachlan James M. Coin,et al.  Realtime analysis and visualization of MinION sequencing data with npReader , 2016, Bioinform..

[17]  Matei David,et al.  Nanocall: an open source basecaller for Oxford Nanopore sequencing data , 2016, bioRxiv.

[19]  Kelly P. Williams,et al.  Resistance Determinants and Mobile Genetic Elements of an NDM-1-Encoding Klebsiella pneumoniae Strain , 2014, PloS one.

[20]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[21]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[22]  Aaron R. Quinlan,et al.  Poretools: a toolkit for analyzing nanopore sequence data , 2014, bioRxiv.

[23]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[24]  Douglas J. Botkin,et al.  Nanopore DNA Sequencing and Genome Assembly on the International Space Station , 2016 .

[25]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[26]  Michael C. Schatz,et al.  Oxford Nanopore Sequencing, Hybrid Error Correction, and de novo Assembly of a Eukaryotic Genome , 2015 .

[27]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[28]  Stefan Engelen,et al.  Genome assembly using Nanopore-guided long and error-free DNA reads , 2015, BMC Genomics.

[29]  Aaron R Quinlan,et al.  Erratum: A reference bacterial genome dataset generated on the MinIONTM portable single-molecule nanopore sequencer , 2015, GigaScience.

[30]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[31]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[32]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[33]  Tomáš Vinař,et al.  DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads , 2016, PloS one.

[34]  Aaron R Quinlan,et al.  A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer , 2014, GigaScience.

[35]  Minh Duc Cao,et al.  Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinIONTM sequencing , 2015, bioRxiv.

[36]  James H. Bullard,et al.  A hybrid approach for the automated finishing of bacterial genomes , 2012, Nature Biotechnology.

[37]  S. Rasmussen,et al.  Identification of acquired antimicrobial resistance genes , 2012, The Journal of antimicrobial chemotherapy.

[38]  Walter Pirovano,et al.  SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information , 2014, BMC Bioinformatics.

[39]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[40]  D. Branton,et al.  Characterization of individual polynucleotide molecules using a membrane channel. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[41]  David Stoddart,et al.  Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore , 2009, Proceedings of the National Academy of Sciences.

[42]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[43]  P. Ashton,et al.  MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island , 2014, Nature Biotechnology.