Hybrid assembly of ultra-long Nanopore reads augmented with 10x-Genomics contigs: Demonstrated with a human genome.

The 3rd generation of sequencing (3GS) technologies generate ultra-long reads (up to 1 Mb), which makes it possible to eliminate gaps and effectively resolve repeats in genome assembly. However, the 3GS technologies suffer from the high base-level error rates (15%-40%) and high sequencing costs. To address these issues, the hybrid assembly strategy, which utilizes both 3GS reads and inexpensive NGS (next generation sequencing) short reads, was invented. Here, we use 10×-Genomics® technology, which integrates a novel bar-coding strategy with Illumina® NGS with an advantage of revealing long-range sequence information, to replace common NGS short reads for hybrid assembly of long erroneous 3GS reads. We demonstrate the feasibility of integrating the 3GS with 10×-Genomics technologies for a new strategy of hybrid de novo genome assembly by utilizing DBG2OLC and Sparc software packages, previously developed by the authors for regular hybrid assembly. Using a human genome as an example, we show that with only 7× coverage of ultra-long Nanopore® reads, augmented with 10× reads, our approach achieved nearly the same level of quality, compared with non-hybrid assembly with 35× coverage of Nanopore reads. Compared with the assembly with 10×-Genomics reads alone, our assembly is gapless with slightly high cost. These results suggest that our new hybrid assembly with ultra-long 3GS reads augmented with 10×-Genomics reads offers a low-cost (less than ¼ the cost of the non-hybrid assembly) and computationally light-weighted (only took 109 calendar hours with peak memory-usage = 61GB on a dual-CPU office workstation) solution for extending the wide applications of the 3GS technologies.

[1]  E. Eichler,et al.  Discovery of large genomic inversions using long range information , 2017, BMC Genomics.

[2]  Niranjan Nagarajan,et al.  Fast and sensitive mapping of nanopore sequencing reads with GraphMap , 2016, Nature Communications.

[3]  W. Kloosterman,et al.  From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy , 2018, Genome Biology.

[4]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[5]  Chengxi Ye,et al.  Distributed under Creative Commons Cc-by 4.0 Sparc: a Sparsity-based Consensus Algorithm for Long Erroneous Sequencing Reads , 2022 .

[6]  Stefan Engelen,et al.  Genome assembly using Nanopore-guided long and error-free DNA reads , 2015, BMC Genomics.

[7]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[8]  Jeffrey Ross-Ibarra,et al.  Improved maize reference genome with single-molecule technologies , 2017, Nature.

[9]  Yu Lin,et al.  Assembly of Long Error-Prone Reads Using de Bruijn Graphs , 2016 .

[10]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[11]  Mihai Pop,et al.  Exploiting sparseness in de novo genome assembly , 2012, BMC Bioinformatics.

[12]  Deanna M. Church,et al.  Linked-Read sequencing resolves complex structural variants , 2017, bioRxiv.

[13]  Can Alkan,et al.  On genomic repeats and reproducibility , 2016, Bioinform..

[14]  Y. Wang,et al.  Profiling of Oral Microbiota in Early Childhood Caries Using Single-Molecule Real-Time Sequencing , 2017, Front. Microbiol..

[15]  J. Landolin,et al.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing , 2014, Nature Biotechnology.

[16]  Mark J. P. Chaisson,et al.  Resolving the complexity of the human genome using single-molecule sequencing , 2014, Nature.

[17]  Karolj Skala,et al.  Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads , 2015, bioRxiv.

[18]  Tam P. Sneddon,et al.  Long-read genome sequencing identifies causal structural variation in a Mendelian disease , 2017, Genetics in Medicine.

[19]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[20]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[21]  Nancy R. Zhang,et al.  Identification of large rearrangements in cancer genomes with barcode linked reads , 2017, Nucleic acids research.

[22]  James H. Bullard,et al.  A hybrid approach for the automated finishing of bacterial genomes , 2012, Nature Biotechnology.

[23]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[24]  Yingrui Li,et al.  BASE: a practical de novo assembler for large genomes using long NGS reads , 2016, BMC Genomics.

[25]  Dmitry Antipov,et al.  hybridSPAdes: an algorithm for hybrid assembly of short and long reads , 2016, Bioinform..

[26]  Feng Luo,et al.  MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads , 2017, Nature Methods.

[27]  Onur Mutlu,et al.  Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions , 2017, Briefings Bioinform..

[28]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[29]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[30]  Benjamin J. Raphael,et al.  Identifying structural variants using linked-read sequencing data , 2017, bioRxiv.

[31]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[32]  Heng Li,et al.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences , 2015, Bioinform..

[33]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[34]  Richard M Leggett,et al.  A world of opportunities with nanopore sequencing. , 2017, Journal of experimental botany.

[35]  Chengxi Ye,et al.  DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies , 2014, Scientific Reports.

[36]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[37]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.