Chromosome-scale, haplotype-resolved assembly of human genomes

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98–99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.

[1]  Sergey Koren,et al.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome , 2019, Nature Biotechnology.

[2]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[3]  Vineet Bafna,et al.  HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies , 2017, Genome research.

[4]  George M. Church,et al.  A haplotype-aware de novo assembly of related individuals using pedigree sequence graph , 2019, Bioinform..

[5]  Chunlin Xiao,et al.  An open resource for accurately benchmarking small variant and reference calls , 2019, Nature Biotechnology.

[6]  Thomas Colthurst,et al.  A universal SNP and small-indel variant caller using deep neural networks , 2018, Nature Biotechnology.

[7]  Timothy P. L. Smith,et al.  FALCON-Phase: Integrating PacBio and Hi-C data for phased diploid genomes , 2018, bioRxiv.

[8]  V. Bansal,et al.  The importance of phase information for human genomics , 2011, Nature Reviews Genetics.

[9]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[10]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[11]  Andrew C. Adey,et al.  Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions , 2013, Nature Biotechnology.

[12]  Li Ding,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2018, Nature Communications.

[13]  Neva C. Durand,et al.  De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds , 2016, Science.

[14]  William T. Harvey,et al.  A fully phased accurate assembly of an individual human genome , 2019, bioRxiv.

[15]  Daisy E. Pagete An end-to-end assembly of the Aedes aegypti genome , 2016, 1605.04619.

[16]  Shilpa Garg,et al.  WhatsHap: fast and accurate read-based phasing , 2016, bioRxiv.

[17]  Sergey Koren,et al.  De novo assembly of haplotype-resolved genomes with trio binning , 2018, Nature Biotechnology.

[18]  Brendan L. O’Connell,et al.  Chromosome-scale shotgun assembly using an in vitro method for long-range linkage , 2015, Genome research.

[19]  Xingang Wang,et al.  RaGOO: fast and accurate reference-guided scaffolding of draft genomes , 2019, Genome Biology.

[20]  Sergey Koren,et al.  A robust benchmark for germline structural variant detection , 2019, bioRxiv.

[21]  Shilpa Garg,et al.  A graph-based approach to diploid genome assembly , 2018, Bioinform..

[22]  Ken Chen,et al.  A robust benchmark for detection of germline large deletions and insertions , 2020, Nature Biotechnology.

[23]  Jill P Mesirov,et al.  Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. , 2005, Genome research.

[24]  Benjamin Neale,et al.  A synthetic-diploid benchmark for accurate variant calling evaluation , 2018, Nature Methods.

[25]  Asif Khalak,et al.  Human Genome Assembly in 100 Minutes , 2019, bioRxiv.

[26]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[27]  Sergey Koren,et al.  Extended haplotype phasing of de novo genome assemblies with FALCON-Phase , 2019 .

[28]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[29]  Steven P. Callahan,et al.  Walking along chromosomes with super-resolution imaging, contact maps, and integrative modeling , 2018, bioRxiv.

[30]  Shilpa Garg,et al.  A haplotype-aware de novo assembly of related individuals using pedigree graph , 2019, bioRxiv.