Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing

BackgroundShort-read sequencing technologies have made microbial genome sequencing cheap and accessible. However, closing genomes is often costly and assembling short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. Long-read, single-molecule sequencing technologies such as the Oxford Nanopore MinION have the potential to overcome these difficulties, although the best approach for harnessing their potential remains poorly evaluated.ResultsWe sequenced nine bacterial genomes spanning a wide range of GC contents using Illumina MiSeq and Oxford Nanopore MinION sequencing technologies to determine the advantages of each approach, both individually and combined. Assemblies using only MiSeq reads were highly accurate but lacked contiguity, a deficiency that was partially overcome by adding MinION reads to these assemblies. Even more contiguous genome assemblies were generated by using MinION reads for initial assembly, but these assemblies were more error-prone and required further polishing. This was especially pronounced when Illumina libraries were biased, as was the case for our strains with both high and low GC content. Increased genome contiguity dramatically improved the annotation of insertion sequences and secondary metabolite biosynthetic gene clusters, likely because long-reads can disambiguate these highly repetitive but biologically important genomic regions.ConclusionsGenome assembly using short-reads is challenged by repetitive sequences and extreme GC contents. Our results indicate that these difficulties can be largely overcome by using single-molecule, long-read sequencing technologies such as the Oxford Nanopore MinION. Using MinION reads for assembly followed by polishing with Illumina reads generated the most contiguous genomes with sufficient accuracy to enable the accurate annotation of important but difficult to sequence genomic features such as insertion sequences and secondary metabolite biosynthetic gene clusters. The combination of Oxford Nanopore and Illumina sequencing can therefore cost-effectively advance studies of microbial evolution and genome-driven drug discovery.

[1]  Onur Mutlu,et al.  Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions , 2017, Briefings Bioinform..

[2]  Jing Li,et al.  De novo yeast genome assemblies from MinION, PacBio and MiSeq platforms , 2017, Scientific Reports.

[3]  Brian D. Ondov,et al.  Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.

[4]  É. Cadieu,et al.  Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains , 2018, BMC Genomics.

[5]  Tom O. Delmont,et al.  Identifying contamination with advanced visualization and analysis practices: metagenomic approaches for eukaryotic genome assemblies , 2016, PeerJ.

[6]  J. Graf,et al.  Host Matters: Medicinal Leech Digestive-Tract Symbionts and Their Pathogenic Potential , 2016, Front. Microbiol..

[7]  Mick Watson,et al.  A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data , 2015, GigaScience.

[8]  J. Peter Gogarten,et al.  Bioinformatic Genome Comparisons for Taxonomic and Phylogenetic Assignments Using Aeromonas as a Test Case , 2014, mBio.

[9]  David A. Eccles,et al.  MinION Analysis and Reference Consortium: Phase 1 data release and analysis , 2015, F1000Research.

[10]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[11]  M. Forsman,et al.  Scaffolding of a bacterial genome using MinION nanopore sequencing , 2015, Scientific Reports.

[12]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[13]  Alberto Magi,et al.  Nanopore sequencing data analysis: state of the art, applications and challenges , 2017, Briefings Bioinform..

[14]  Nicola K. Petty,et al.  BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons , 2011, BMC Genomics.

[15]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[16]  Thomas Wiehe,et al.  How repetitive are genomes? , 2006, BMC Bioinformatics.

[17]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[18]  Kimberly A. Thomas,et al.  Impact of three Illumina library construction methods on GC bias and HLA genotype calling. , 2015, Human immunology.

[19]  Niall J. Haslam,et al.  An analysis of the feasibility of short read sequencing , 2005, Nucleic acids research.

[20]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[21]  Richard M Leggett,et al.  MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry , 2017, F1000Research.

[22]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[23]  Mihai Pop,et al.  Assembly complexity of prokaryotic genomes using short reads , 2010, BMC Bioinformatics.

[24]  Zamin Iqbal,et al.  Resolving plasmid structures in Enterobacteriaceae using the MinION nanopore sequencer: assessment of MinION and MinION/Illumina hybrid data assembly approaches , 2017, Microbial genomics.

[25]  Edward J. Feil,et al.  The use of Oxford Nanopore native barcoding for complete genome assembly , 2017, GigaScience.

[26]  R. Selander,et al.  Analysis of genetic variation by polymerase chain reaction-based nucleotide sequencing. , 1994, Methods in enzymology.

[27]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[28]  J. Graf,et al.  Ingested Blood Contributes to the Specificity of the Symbiosis of Aeromonas veronii Biovar Sobria andHirudo medicinalis, the Medicinal Leech , 2000, Applied and Environmental Microbiology.

[29]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[30]  Martin C. Frith,et al.  MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry , 2017, F1000Research.

[31]  R. Wilson,et al.  What is finished, and why does it matter. , 2002, Genome research.

[33]  Shawn W. Polson,et al.  Evaluation of a Transposase Protocol for Rapid Generation of Shotgun High-Throughput Sequencing Libraries from Nanogram Quantities of DNA , 2011, Applied and Environmental Microbiology.

[34]  Tsunglin Liu,et al.  Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly , 2013, PloS one.

[35]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[36]  Carlos de Lannoy,et al.  A sequencer coming of age: De novo genome assembly using MinION reads. , 2017, F1000Research.

[37]  Francesca Giordano,et al.  Oxford Nanopore MinION Sequencing and Genome Assembly , 2016, Genom. Proteom. Bioinform..

[38]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[39]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[40]  Ryan R. Wick,et al.  Completing bacterial genome assemblies with multiplex MinION sequencing , 2017, bioRxiv.

[41]  Mick Watson,et al.  Mind the gaps – ignoring errors in long read assemblies critically affects protein prediction , 2018, bioRxiv.

[42]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[43]  Karolj Skala,et al.  Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads , 2015, bioRxiv.

[44]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[45]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[46]  P. Ashton,et al.  MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island , 2014, Nature Biotechnology.

[47]  S. Osawa,et al.  The guanine and cytosine content of genomic DNA and bacterial evolution. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Jeffrey H. Miller Experiments in molecular genetics , 1972 .

[49]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[50]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[51]  B. LaFrentz,et al.  Laboratory Maintenance of Flavobacterium psychrophilum and Flavobacterium columnare , 2007, Current protocols in microbiology.

[52]  John A. C. Archer,et al.  Read Length and Repeat Resolution: Exploring Prokaryote Genomes Using Next-Generation Sequencing Technologies , 2010, PloS one.

[53]  C. Currie,et al.  Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation , 2012, BMC Genomics.

[54]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[55]  Dong-Chan Oh,et al.  Dentigerumycin: a bacterial mediator of an ant-fungus symbiosis. , 2009, Nature chemical biology.

[56]  M. Poulsen,et al.  Association between Pseudonocardia symbionts and Atta leaf-cutting ants suggested by improved isolation methods. , 2013, International microbiology : the official journal of the Spanish Society for Microbiology.

[57]  P. Siguier,et al.  ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes , 2011, Genome Biology.

[58]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[59]  Richard M Leggett,et al.  A world of opportunities with nanopore sequencing. , 2017, Journal of experimental botany.

[60]  Tom O. Delmont,et al.  Anvi’o: an advanced analysis and visualization platform for ‘omics data , 2015, PeerJ.

[61]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[62]  J. Ahringer,et al.  Systematic bias in high-throughput sequencing data and its correction by BEADS , 2011, Nucleic acids research.

[63]  Dmitry Antipov,et al.  hybridSPAdes: an algorithm for hybrid assembly of short and long reads , 2016, Bioinform..

[64]  Sushil Pandey,et al.  A complete high-quality MinION nanopore assembly of an extensively drug-resistant Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions , 2018, Microbial genomics.

[65]  Julian Parkhill,et al.  Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology , 2016, bioRxiv.

[66]  Alexander Payne,et al.  BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files , 2018, Bioinform..

[67]  S. Salzberg,et al.  The Value of Complete Microbial Genome Sequencing (You Get What You Pay For) , 2002, Journal of bacteriology.

[68]  Matthew S. Fullmer,et al.  Low-Level Antimicrobials in the Medicinal Leech Select for Resistant Pathogens That Spread to Patients , 2018, mBio.

[69]  J. Graf,et al.  Characterization of a catalase gene from Aeromonas veronii, the digestive-tract symbiont of the medicinal leech. , 2007, Microbiology.

[70]  J. Shendure,et al.  DNA sequencing at 40: past, present and future , 2017, Nature.

[71]  Matthew Loose,et al.  Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files , 2018 .

[72]  Aaron R. Quinlan,et al.  Poretools: a toolkit for analyzing nanopore sequence data , 2014, bioRxiv.

[73]  Timothy P. L. Smith,et al.  Reducing assembly complexity of microbial genomes with single-molecule sequencing , 2013, Genome Biology.

[74]  Jared T. Simpson,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015 .