Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data

Abstract Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.

[1]  Marie E. Bolger,et al.  Reconstructing the Gigabase Plant Genome of Solanum pennellii using Nanopore Sequencing , 2017, bioRxiv.

[2]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[3]  Jonas Korlach,et al.  De Novo PacBio long-read and phased avian genome assemblies correct and add to genes important in neuroscience research , 2017, bioRxiv.

[4]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[5]  Hugh E. Olsen,et al.  Whole genome sequencing and assembly of a Caenorhabditis elegans genome with complex genomic rearrangements using the MinION sequencing device , 2017, bioRxiv.

[6]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[7]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[8]  A. Amores,et al.  Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences , 2011, G3: Genes | Genomes | Genetics.

[9]  Haibao Tang,et al.  Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum , 2015, Nature.

[10]  Sarath Chandra Janga,et al.  Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches , 2016, BMC Genomics.

[11]  Hsin-Hung Lin,et al.  Completing bacterial genome assemblies: strategy and performance comparisons , 2015, Scientific Reports.

[12]  Daniel G. Brown,et al.  Algorithms in Bioinformatics , 2014, Lecture Notes in Computer Science.

[13]  Steven G. Schroeder,et al.  Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome , 2017, Nature Genetics.

[14]  P. Etter,et al.  Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers , 2008, PloS one.

[15]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[16]  Steven Salzberg,et al.  GAGE-B: an evaluation of genome assemblers for bacterial organisms , 2013, Bioinform..

[17]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[18]  Jonas Korlach,et al.  De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads , 2017, GigaScience.

[19]  S. Allen,et al.  Single-Molecule Sequencing of the Drosophila serrata Genome , 2016, G3: Genes, Genomes, Genetics.

[20]  Rikky W. Purbojati,et al.  Correction for Lan et al., Long-read sequencing uncovers the adaptive topography of a carnivorous plant genome , 2017, Proceedings of the National Academy of Sciences.

[21]  Michael C. Schatz,et al.  Third-generation sequencing and the future of genomics , 2016, bioRxiv.

[22]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[23]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[24]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[25]  Mark J. P. Chaisson,et al.  High-Quality Assembly of an Individual of Yoruban Descent , 2016 .

[26]  Eugene W. Myers,et al.  Efficient Local Alignment Discovery amongst Noisy Long Reads , 2014, WABI.

[27]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[28]  Michael C. Schatz,et al.  Assemblytics: a web analytics tool for the detection of variants from an assembly , 2016, Bioinform..

[29]  T. Kocher,et al.  A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions , 2017, BMC Genomics.

[30]  Pavel A. Pevzner,et al.  Assembly of long error-prone reads using de Bruijn graphs , 2016, Proceedings of the National Academy of Sciences.

[31]  Alan Christoffels,et al.  Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding , 2016, PLoS genetics.

[32]  Ying Chen,et al.  MECAT: an ultra-fast mapping, error correction and de novo assembly tool for single-molecule sequencing reads , 2016, bioRxiv.

[33]  Feng Luo,et al.  MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads , 2017, Nature Methods.

[34]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[35]  Qingfeng Chen,et al.  Recent advances in sequence assembly: principles and applications , 2017, Briefings in functional genomics.

[36]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[37]  Bernardo J. Clavijo,et al.  Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. , 2017, Genome research.

[38]  M. Schatz,et al.  Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing , 2016, DNA research : an international journal for rapid publication of reports on genes and genomes.

[39]  Stefan Engelen,et al.  de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer , 2016, bioRxiv.

[40]  Jeffrey Ross-Ibarra,et al.  Improved maize reference genome with single-molecule technologies , 2017, Nature.

[41]  Hani Z. Girgis Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale , 2015, BMC Bioinformatics.

[42]  Karolj Skala,et al.  Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads , 2015, bioRxiv.

[43]  Ilan Shomorony,et al.  HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution , 2016, bioRxiv.

[44]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[45]  Eugene W. Myers A history of DNA sequence assembly , 2016, it Inf. Technol..

[46]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[47]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[48]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[49]  Yan Li,et al.  Sequencing and de novo assembly of a near complete indica rice genome , 2017, Nature Communications.

[50]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[51]  J. Parkhill,et al.  Circlator: automated circularization of genome assemblies using long sequencing reads , 2015, bioRxiv.

[52]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[53]  E. Eichler,et al.  Long-read sequencing and de novo assembly of a Chinese genome , 2016, Nature Communications.

[54]  James G. Baldwin-Brown,et al.  Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage , 2016, bioRxiv.

[55]  Hiroaki Sakai,et al.  The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome , 2015, Scientific Reports.

[56]  Jacqueline A. Keane,et al.  Circlator: automated circularization of genome assemblies using long sequencing reads , 2015, Genome Biology.

[57]  Y. Sakakibara,et al.  Genome sequence and analysis of the Japanese morning glory Ipomoea nil , 2016, Nature Communications.

[58]  Sergey Koren,et al.  De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing[CC-BY] , 2017, Plant Cell.

[59]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[60]  M. Pop,et al.  The Theory and Practice of Genome Sequence Assembly. , 2015, Annual review of genomics and human genetics.

[61]  A. Pang,et al.  Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications , 2017, Genome research.

[62]  David Haussler,et al.  Long-read sequence assembly of the gorilla genome , 2016, Science.

[63]  Evan E. Eichler,et al.  Genetic variation and the de novo assembly of human genomes , 2015, Nature Reviews Genetics.

[64]  I. Birol,et al.  Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art , 2016, Bioinform..

[65]  Michael Roberts,et al.  Reducing storage requirements for biological sequence comparison , 2004, Bioinform..