Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph

Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two human and two plant genomes, we demonstrate that our algorithm is around an order of magnitude cheaper than existing methods, while producing better diploid and haploid assemblies. Notably, our algorithm is the only feasible solution to the haplotype-resolved assembly of polyploid genomes.

[1]  Jordan M. Eizenga,et al.  Phased nanopore assembly with Shasta and modular graph phasing with GFAse , 2023, bioRxiv.

[2]  Glennis A. Logsdon,et al.  Telomere-to-telomere assembly of diploid chromosomes with Verkko , 2023, Nature Biotechnology.

[3]  Chirag Jain Coverage-preserving sparsification of overlap graphs for long-read assembly , 2022, bioRxiv.

[4]  William T. Harvey,et al.  A draft human pangenome reference , 2022, bioRxiv.

[5]  William T. Harvey,et al.  Gaps and complex structurally variant loci in phased genome assemblies , 2022, bioRxiv.

[6]  C. Bachem,et al.  Genome architecture and tetrasomic inheritance of autotetraploid potato. , 2022, Molecular plant.

[7]  Heng Li,et al.  Haplotype-resolved assembly of diploid genomes without parental data , 2022, Nature Biotechnology.

[8]  P. Pevzner,et al.  Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads , 2022, Nature Biotechnology.

[9]  K. Ye,et al.  High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads , 2021, bioRxiv.

[10]  M. Schatz,et al.  The genetic and epigenetic landscape of the Arabidopsis centromeres , 2021, bioRxiv.

[11]  Aaron M. Streets,et al.  The complete sequence of a human genome , 2021, bioRxiv.

[12]  Heng Li,et al.  Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm , 2021, Nature Methods.

[13]  Tobias Marschall,et al.  MBG: Minimizer-based sparse de Bruijn Graph construction , 2020, bioRxiv.

[14]  Sergey Koren,et al.  HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads , 2020, bioRxiv.

[15]  Chong Chu,et al.  The design and construction of reference pangenome graphs with minigraph , 2020, Genome Biology.

[16]  Nathan D. Olson,et al.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome , 2019, Nature Biotechnology.

[17]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[18]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[19]  Felipe A. Simão BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs , 2016 .

[20]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[21]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.