A spectral algorithm for fast de novo layout of uncorrected long nanopore reads

Motivation New long read sequencers promise to transform sequencing and genome assembly by producing reads tens of kilobases long. However, their high error rate significantly complicates assembly and requires expensive correction steps to layout the reads using standard assembly engines. Results We present an original and efficient spectral algorithm to layout the uncorrected nanopore reads, and its seamless integration into a straightforward overlap/layout/consensus (OLC) assembly scheme. The method is shown to assemble Oxford Nanopore reads from several bacterial genomes into good quality (˜99% identity to the reference) genome‐sized contigs, while yielding more fragmented assemblies from the eukaryotic microbe Sacharomyces cerevisiae. Availability and implementation https://github.com/antrec/spectrassembler. Contact antoine.recanati@inria.fr Supplementary Information Supplementary data are available at Bioinformatics online.

[1]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[2]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[3]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[4]  Niranjan Nagarajan,et al.  Fast and sensitive mapping of nanopore sequencing reads with GraphMap , 2016, Nature Communications.

[5]  Martin Middendorf,et al.  On Physical Mapping and the Consecutive Ones Property for Sparse Matrices , 1996, Discret. Appl. Math..

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  Mihai Pop,et al.  Scaffolding and validation of bacterial genome assemblies using optical restriction maps , 2008, Bioinform..

[8]  Bruce Hendrickson,et al.  A Spectral Algorithm for Seriation and the Consecutive Ones Problem , 1999, SIAM J. Comput..

[9]  Justin Chu,et al.  NanoSim: nanopore sequence read simulator based on statistical characterization , 2016, bioRxiv.

[10]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[11]  Jo L. Dicks,et al.  THREaD Mapper Studio: a novel, visual web server for the estimation of genetic linkage maps , 2010, Nucleic Acids Res..

[12]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[13]  S. Koren,et al.  One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. , 2015, Current opinion in microbiology.

[14]  Alexandre d'Aspremont,et al.  Convex Relaxations for Permutation Problems , 2013, SIAM J. Matrix Anal. Appl..

[15]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[16]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[17]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[18]  D. Schwartz,et al.  Optical mapping and its potential for large-scale sequencing projects. , 1999, Trends in biotechnology.

[19]  Cédric Chauve,et al.  ANGES: reconstructing ANcestral GEnomeS maps , 2012, Bioinform..

[20]  Mihai Pop,et al.  Shotgun Sequence Assembly , 2004, Adv. Comput..

[21]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[22]  Michael C. Schatz,et al.  Oxford Nanopore Sequencing, Hybrid Error Correction, and de novo Assembly of a Eukaryotic Genome , 2015 .

[23]  Stefan Engelen,et al.  Genome assembly using Nanopore-guided long and error-free DNA reads , 2015, BMC Genomics.

[24]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[25]  Eugene W. Myers,et al.  Efficient Local Alignment Discovery amongst Noisy Long Reads , 2014, WABI.

[26]  Timothy P. L. Smith,et al.  Reducing assembly complexity of microbial genomes with single-molecule sequencing , 2013, Genome Biology.

[27]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[28]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[29]  Sergey Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.