Optimized use of Oxford Nanopore flowcells for hybrid assemblies

Hybrid assemblies are highly valuable for studies of Enterobacteriaceae due to their ability to fully resolve the structure of mobile genetic elements, such as plasmids, which are involved in the carriage of clinically important genes (e.g. those involved in AMR/virulence). The widespread application of this technique is currently primarily limited by cost. Recent data has suggested that non-inferior, and even superior, hybrid assemblies can be produced using a fraction of the total output from a multiplexed nanopore (Oxford Nanopore Technologies [ONT]) flowcell run. In this study we sought to determine the optimal minimal running time for flowcells when acquiring reads for hybrid assembly. We then evaluated whether the ONT wash kit might allow users to exploit shorter running times by sequencing multiple libraries per flowcell. After 24 hours of sequencing, most chromosomes and plasmids had circularised and there was no benefit associated with longer running times. Quality was similar at 12 hours suggesting shorter running times are likely to be acceptable for certain applications (e.g. plasmid genomics). The ONT wash kit was highly effective in removing DNA between libraries. Contamination between libraries did not appear to affect subsequent hybrid assemblies, even when the same barcodes were used successively on a single flowcell. Utilising shorter run-times in combination with between-library nuclease washes allows at least 36 Enterobacteriaceae isolates to be sequenced per flowcell, significantly reducing the per isolate sequencing cost. Ultimately this will facilitate large-scale studies utilising hybrid assembly advancing our understanding of the genomics of key human pathogens. Data Summary Raw sequencing data is available via NCBI under project accession number PRJNA604975. Sample accession numbers are provided in table S1. Assemblies are available via Figshare https://doi.org/10.6084/m9.figshare.11816532.v1. Impact Statement Most existing sequencing data has been acquired from short-read platforms (eg. Illumina). For some species of bacteria, clinically important genes, such as those involved in antibiotic resistance and/or virulence, are carried on plasmids. Whilst Illumina sequencing is highly accurate, it is generally unable to resolve complete genomic structures due to repetitive regions. Hybrid assembly uses long reads to scaffold together short-read contigs, maximising the benefits of both technologies. A major limiting factor to using hybrid assemblies at scale is the cost of sequencing the same isolate with two different technologies. Here we show that high-quality hybrid assemblies can be created for most isolates using significantly shorter run-times than are currently standard. We demonstrate that a simple washing step allows several libraries to be run on the same flowcell, facilitating the ability to take advantage of shorter running times. Adding nuclease means that contamination between libraries is minimal and has no significant effect on the quality of subsequent hybrid assemblies. This approach reduces the cost of acquiring long reads by >30%, paving the way for large-scale studies utilising hybrid assemblies which will likely significantly enhance our understanding of the genomics of important human pathogens.

[1]  J. Wagenaar,et al.  Plasmids carrying antimicrobial resistance genes in Enterobacteriaceae , 2018, The Journal of antimicrobial chemotherapy.

[2]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[3]  Nicola De Maio,et al.  Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes , 2019, Microbial genomics.

[4]  Zhong Wang,et al.  ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies , 2013, Bioinform..

[5]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[6]  Sergio Arredondo-Alonso,et al.  mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species , 2018, bioRxiv.

[7]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[8]  S. Salzberg,et al.  Centrifuge: rapid and sensitive classification of metagenomic sequences , 2016, bioRxiv.

[9]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[10]  Zamin Iqbal,et al.  Resolving plasmid structures in Enterobacteriaceae using the MinION nanopore sequencer: assessment of MinION and MinION/Illumina hybrid data assembly approaches , 2017, Microbial genomics.

[11]  Jeremy Swann,et al.  Real-time analysis of nanopore-based metagenomic sequencing from infected orthopaedic devices , 2018, BMC Genomics.

[12]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[13]  Ryan R. Wick,et al.  Completing bacterial genome assemblies with multiplex MinION sequencing , 2017, bioRxiv.

[14]  Ryan R Wick,et al.  Deepbinner: Demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks , 2018, bioRxiv.

[15]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[16]  Ryan R. Wick,et al.  Benchmarking of long-read assemblers for prokaryote whole , 2019 .

[17]  Liam P. Shaw,et al.  Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes , 2019, bioRxiv.