NextPolish: a fast and efficient genome polishing tool for long-read assembly

MOTIVATION While long-read sequencing technologies can produce genomes with long contiguity, they suffer from high error rates. Thus, we developed NextPolish, a tool that efficiently corrects sequence errors in genomes assembled with long reads. This new tool consists of two interlinked modules that are designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors. RESULTS When evaluated for the speed and efficiency using human and a plant (Arabidopsis thalina) genomes, NextPolish outperformed Pilon by correcting sequence errors faster, and with a higher correction accuracy. AVAILABILITY NextPolish is implemented in C and Python. The source code is available from https://github.com/Nextomics/NextPolish. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Detlef Weigel,et al.  High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell , 2018, Nature Communications.

[2]  E. Eichler,et al.  Long-read sequencing and de novo assembly of a Chinese genome , 2016, Nature Communications.

[3]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[6]  Mihai Pop,et al.  Computational methods for optical mapping , 2014, GigaScience.

[7]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[8]  F. Denoeud,et al.  Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps , 2018, Nature Plants.

[9]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[10]  C. Richards,et al.  A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour , 2019, Nature Communications.

[11]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[12]  Justin Chu,et al.  ARCS: scaffolding genome drafts with linked reads , 2017, Bioinform..

[13]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[14]  Mick Watson,et al.  Errors in long-read assemblies can critically affect protein prediction , 2019, Nature Biotechnology.