Benchmarking of long-read assemblers for prokaryote whole

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of six long-read assemblers (Canu, Flye, Miniasm/Minipolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.6 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 was the only assembler which consistently produced clean contig circularisation. Raven v0.0.5 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.3.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

[1]  Mile Šikić,et al.  Yet another de novo genome assembler , 2019, bioRxiv.

[2]  Thomas Wiehe,et al.  How repetitive are genomes? , 2006, BMC Bioinformatics.

[3]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[4]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[5]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[6]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[7]  Eugene W. Myers,et al.  Efficient Local Alignment Discovery amongst Noisy Long Reads , 2014, WABI.

[8]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[9]  Ying Chen,et al.  Fast and accurate assembly of Nanopore reads via progressive error correction and adaptive read selection , 2020, bioRxiv.

[10]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[11]  I. Nookaew,et al.  Insights from 20 years of bacterial genome sequencing , 2015, Functional & Integrative Genomics.

[12]  Liam P. Shaw,et al.  Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes , 2019, bioRxiv.

[13]  Ryan R. Wick,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[14]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[15]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[16]  Adam M. Phillippy,et al.  Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit , 2019, bioRxiv.

[17]  Nicola De Maio,et al.  Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes , 2019, Microbial genomics.

[18]  Ilan Shomorony,et al.  HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution , 2016, bioRxiv.

[19]  Heng Li,et al.  Fast and accurate long-read assembly with wtdbg2 , 2019, Nature Methods.

[20]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[21]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[22]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[23]  Eugene W. Myers A history of DNA sequence assembly , 2016, it Inf. Technol..

[24]  Completing bacterial genome assemblies with multiplex MinION sequencing , 2017 .

[25]  David Ellis,et al.  Current Strategies of Polyploid Plant Genome Sequence Assembly , 2018, Front. Plant Sci..

[26]  Ryan R. Wick,et al.  Badread: simulation of error-prone long reads , 2019, J. Open Source Softw..

[27]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[28]  Mile Šikić,et al.  Fast and accurate de novo genome assembly from long uncorrected reads , 2016, bioRxiv.

[29]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.