HyPo: Super Fast & Accurate Polisher for Long Read Genome Assemblies

Efforts towards making population-scale long read genome assemblies (especially human genomes) viable have intensified recently with the emergence of many fast assemblers. The reliance of these fast assemblers on polishing for the accuracy of assemblies makes it crucial. We present HyPo–a Hybrid Polisher–that utilises short as well as long reads within a single run to polish a long read assembly of small and large genomes. It exploits unique genomic kmers to selectively polish segments of contigs using partial order alignment of selective read-segments. As demonstrated on human genome assemblies, Hypo generates significantly more accurate polished assemblies in about one-third time with about half the memory requirements in comparison to Racon (the widely used polisher currently).

[1]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[2]  C. Alkan,et al.  Hercules: a profile HMM-based hybrid error correction algorithm for long reads , 2017, bioRxiv.

[3]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[4]  Srinivas Aluru,et al.  A comprehensive evaluation of long read error correction methods , 2019, BMC Genomics.

[5]  Mile Šikić,et al.  Yet another de novo genome assembler , 2019, bioRxiv.

[6]  Richard J. Roberts,et al.  The advantages of SMRT sequencing , 2013, Genome Biology.

[7]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[8]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[9]  Onur Mutlu,et al.  Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm , 2019, Bioinform..

[10]  Karolj Skala,et al.  Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads , 2015, bioRxiv.

[11]  Alexa B. R. McIntyre,et al.  Extensive sequencing of seven human genomes to characterize benchmark reference materials , 2015, Scientific Data.

[12]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[13]  René L. Warren,et al.  ntEdit: scalable genome sequence polishing , 2019, bioRxiv.

[14]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[15]  Mick Watson,et al.  Errors in long-read assemblies can critically affect protein prediction , 2019, Nature Biotechnology.

[16]  F. De Filippis,et al.  A Selected Core Microbiome Drives the Early Stages of Three Popular Italian Cheese Manufactures , 2014, PloS one.

[17]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[18]  Paolo Piazza,et al.  Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis , 2017, F1000Research.

[19]  K. Au,et al.  Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis , 2017, F1000Research.

[20]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[21]  Sergey Koren,et al.  Telomere-to-telomere assembly of a complete human X chromosome , 2019, bioRxiv.

[22]  Christopher J. Lee Generating Consensus Sequences from Partial Order Multiple Sequence Alignment Graphs , 2003, Bioinform..

[23]  Mauricio O. Carneiro,et al.  The advantages of SMRT sequencing , 2013, Genome Biology.

[24]  Hamid Mohamadi,et al.  ntEdit: scalable genome sequence polishing , 2019, Bioinform..

[25]  Michael Roberts,et al.  Reducing storage requirements for biological sequence comparison , 2004, Bioinform..

[26]  Heng Li,et al.  Fast and accurate long-read assembly with wtdbg2 , 2019, Nature Methods.

[27]  Kin Fai Au,et al.  A comparative evaluation of hybrid error correction methods for error-prone long reads , 2019, Genome Biology.

[28]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[29]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[30]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[31]  Dmitry Antipov,et al.  Versatile genome assembly evaluation with QUAST-LG , 2018, Bioinform..

[32]  Michael C. Schatz,et al.  Third-generation sequencing and the future of genomics , 2016, bioRxiv.