Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods affect hybrid assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the family Enterobacteriaceae, as these frequently have highly plastic, repetitive genetic structures, and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read-only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.

[1]  Edward J. Feil,et al.  The use of Oxford Nanopore native barcoding for complete genome assembly , 2017, GigaScience.

[2]  M. Berriman,et al.  REAPR: a universal tool for genome assembly evaluation , 2013, Genome Biology.

[3]  Wolfgang Rabsch,et al.  Diversity of Phage Types among Archived Cultures of the Demerec Collection of Salmonella enterica serovar Typhimurium Strains , 2004, Applied and Environmental Microbiology.

[4]  Wouter De Coster,et al.  NanoPack: visualizing and processing long-read sequencing data , 2018, bioRxiv.

[5]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[6]  Andrew J. Page,et al.  Roary: rapid large-scale prokaryote pan genome analysis , 2015, bioRxiv.

[7]  Zamin Iqbal,et al.  Resolving plasmid structures in Enterobacteriaceae using the MinION nanopore sequencer: assessment of MinION and MinION/Illumina hybrid data assembly approaches , 2017, Microbial genomics.

[8]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[9]  W. Kloosterman,et al.  From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy , 2018, Genome Biology.

[10]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[11]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[12]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[13]  Zhong Wang,et al.  ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies , 2013, Bioinform..

[14]  Mick Watson,et al.  A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data , 2015, GigaScience.

[15]  Alessandra Carattoli,et al.  Resistance Plasmid Families in Enterobacteriaceae , 2009, Antimicrobial Agents and Chemotherapy.

[16]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[17]  Paolo Piazza,et al.  Improved workflows for high throughput library preparation using the transposome-based nextera system , 2013, BMC Biotechnology.

[18]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[19]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[20]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[21]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[22]  Robert A. Weinstein,et al.  The Epidemiology of Carbapenem-Resistant Enterobacteriaceae: The Impact and Evolution of a Global Menace , 2017, The Journal of infectious diseases.

[23]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[24]  Alexander Dilthey,et al.  Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing , 2019, Genome Biology.

[25]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[26]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[27]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.

[28]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[29]  Ryan R. Wick,et al.  Completing bacterial genome assemblies with multiplex MinION sequencing , 2017, bioRxiv.

[30]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[31]  E. Virginia Armbrust,et al.  pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree , 2010, BMC Bioinformatics.

[32]  Daniel J. Wilson,et al.  Within-host evolution of bacterial pathogens , 2016, Nature Reviews Microbiology.

[33]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[34]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[35]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..