A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings

The reconstruction of individual haplotypes can facilitate the interpretation of disease risks; however, high costs and technical challenges still hinder their assessment in clinical settings. Second-generation sequencing is the gold standard for variant discovery but, due to the production of short reads covering small genomic regions, allows only indirect haplotyping based on statistical methods. In contrast, third-generation methods such as the nanopore sequencing platform developed by Oxford Nanopore Technologies (ONT) generate long reads that can be used for direct haplotyping, with fewer drawbacks. However, robust standards for variant phasing in ONT-based target resequencing efforts are not yet available. In this study, we presented a streamlined proof-of-concept workflow for variant calling and phasing based on ONT data in a clinically relevant 12-kb region of the APOE locus, a hotspot for variants and haplotypes associated with aging-related diseases and longevity. Starting with sequencing data from simple amplicons of the target locus, we demonstrated that ONT data allow for reliable single-nucleotide variant (SNV) calling and phasing from as little as 60 reads, although the recognition of indels is less efficient. Even so, we identified the best combination of ONT read sets (600) and software (BWA/Minimap2 and HapCUT2) that enables full haplotype reconstruction when both SNVs and indels have been identified previously using a highly-accurate sequencing platform. In conclusion, we established a rapid and inexpensive workflow for variant phasing based on ONT long reads. This allowed for the analysis of multiple samples in parallel and can easily be implemented in routine clinical practice, including diagnostic testing.

[1]  S. Wich,et al.  DNA Barcoding of Nematodes Using the MinION , 2020, Frontiers in Ecology and Evolution.

[2]  R. Kirkegaard,et al.  Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers and Nanopore sequencing , 2019, bioRxiv.

[3]  J. Haines,et al.  Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. , 1993, Science.

[4]  Yunxin Fu,et al.  Association of common variants in TOMM40/APOE/APOC1 region with human longevity in a Chinese population , 2015, Journal of Human Genetics.

[5]  Alexander Hoischen,et al.  Long-Read Sequencing Emerging in Medical Genetics , 2019, Front. Genet..

[6]  Matthew W. Snyder,et al.  Haplotype-resolved genome sequencing: experimental methods and applications , 2015, Nature Reviews Genetics.

[7]  A. Ameur,et al.  Xdrop: Targeted sequencing of long DNA molecules from low input samples using droplet sorting , 2018, bioRxiv.

[8]  D. Deforce,et al.  Nanopore Sequencing of a Forensic STR Multiplex Reveals Loci Suitable for Single-Contributor STR Profiling , 2020, Genes.

[9]  E. Rogaev,et al.  Haplotype analysis of APOE intragenic SNPs , 2018, BMC Neuroscience.

[10]  P. Froguel,et al.  Genetic associations with human longevity at the APOE and ACE loci , 1994, Nature Genetics.

[11]  Christos Proukakis,et al.  Detection of GBA missense mutations and other variants using the Oxford Nanopore MinION , 2018, bioRxiv.

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  R. Bertina,et al.  Haplotypes of the EPCR gene, prothrombin levels, and the risk of venous thrombosis in carriers of the prothrombin G20210A mutation , 2008, Haematologica.

[14]  C. Franceschi,et al.  An APOE haplotype associated with decreased ε4 expression increases the risk of late onset Alzheimer's disease. , 2011, Journal of Alzheimer's disease : JAD.

[15]  Brendan L. O’Connell,et al.  Chromosome-scale shotgun assembly using an in vitro method for long-range linkage , 2015, Genome research.

[16]  Qihua Tan,et al.  Evidence from case–control and longitudinal studies supports associations of genetic variation in APOE, CETP, and IL6 with human longevity , 2013, AGE.

[17]  S. Sukumar,et al.  Targeted nanopore sequencing with Cas9-guided adapter ligation , 2020, Nature Biotechnology.

[18]  Ruibang Luo,et al.  Exploring the limit of using a deep neural network on pileup data for germline variant calling , 2020, Nature Machine Intelligence.

[19]  Volodymyr Kuleshov,et al.  Probabilistic single-individual haplotyping , 2014, Bioinform..

[20]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[21]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[22]  Jing Wang,et al.  CrossMap: a versatile tool for coordinate conversion between genome assemblies , 2014, Bioinform..

[23]  Francesca Giordano,et al.  Oxford Nanopore MinION Sequencing and Genome Assembly , 2016, Genom. Proteom. Bioinform..

[24]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[25]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[26]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[27]  Yan Li,et al.  Association between polymorphisms in the promoter region of the apolipoprotein E (APOE) gene and Alzheimer's disease: A meta-analysis , 2017, EXCLI journal.

[28]  Cesare Centomo,et al.  On site DNA barcoding by nanopore sequencing , 2017, PloS one.

[29]  Y. Ebenstein,et al.  Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping , 2017, bioRxiv.

[30]  Vineet Bafna,et al.  HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies , 2017, Genome research.

[31]  Victor Guryev,et al.  Dense and accurate whole-chromosome haplotyping of individual genomes , 2017, Nature Communications.

[32]  O. Andreassen,et al.  Genetic architecture of sporadic frontotemporal dementia and overlap with Alzheimer's and Parkinson's diseases , 2016, Journal of Neurology, Neurosurgery & Psychiatry.

[33]  Zuhong Lu,et al.  Recent Advances in Experimental Whole Genome Haplotyping Methods , 2017, International journal of molecular sciences.

[34]  Gary D Bader,et al.  Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes , 2015, F1000Research.

[35]  Massimo Delledonne,et al.  A rapid and accurate MinION-based workflow for tracking species biodiversity in the field , 2019 .

[36]  A. Naccarati,et al.  Genotype and Haplotype Analyses of TP53 Gene in Breast Cancer Patients: Association with Risk and Clinical Outcomes , 2015, PloS one.

[37]  S. Koren,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, bioRxiv.

[38]  Thomas Meitinger,et al.  A genome-wide association study confirms APOE as the major gene influencing survival in long-lived individuals , 2011, Mechanisms of Ageing and Development.

[39]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[40]  Ryan R. Wick,et al.  Performance of neural network basecalling tools for Oxford Nanopore sequencing , 2019, Genome Biology.

[41]  Victor Guryev,et al.  Direct chromosome-length haplotyping by single-cell sequencing , 2016, Genome research.

[42]  P. Passmore,et al.  Age-Related Macular Degeneration-Associated Genes in Alzheimer Disease. , 2015, The American journal of geriatric psychiatry : official journal of the American Association for Geriatric Psychiatry.

[43]  Sian Ellard,et al.  Pitfalls of haplotype phasing from amplicon-based long-read sequencing , 2016, Scientific Reports.

[44]  Christos Proukakis,et al.  Evaluation of the detection of GBA missense mutations and other variants using the Oxford Nanopore MinION , 2019, Molecular genetics & genomic medicine.

[45]  Jutta Gampe,et al.  Genome-wide association meta-analysis of human longevity identifies a novel locus conferring survival beyond 90 years of age , 2014, Human molecular genetics.

[46]  Rebecca F. Halperin,et al.  A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. , 2007, The Journal of clinical psychiatry.

[47]  Sebastiaan Theuns,et al.  High quality genome assemblies of Mycoplasma bovis using a taxon-specific Bonito basecaller for MinION and Flongle long-read nanopore sequencing , 2020, BMC Bioinformatics.

[48]  Edwin Cuppen,et al.  Mapping and phasing of structural variation in patient genomes using nanopore sequencing , 2017, Nature Communications.

[49]  Wouter De Coster,et al.  NanoPack: visualizing and processing long-read sequencing data , 2018, bioRxiv.

[50]  V. Bansal,et al.  Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing , 2019, Nature Communications.

[51]  W. Kloosterman,et al.  From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy , 2018, Genome Biology.

[52]  J. Haines,et al.  Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levels , 2014, Alzheimer's Research & Therapy.

[53]  Leo van Iersel,et al.  WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads , 2015, J. Comput. Biol..