NanoVar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing

Despite the increasing relevance of structural variants (SV) in the development of many human diseases, progress in novel pathological SV discovery remains impeded, partly due to the challenges of accurate and routine SV characterization in patients. The recent advent of third-generation sequencing (3GS) technologies brings promise for better characterization of genomic aberrations by virtue of having longer reads. However, the applications of 3GS are restricted by their high sequencing error rates and low sequencing throughput. To overcome these limitations, we present NanoVar, an accurate, rapid and low-depth (4X) 3GS SV caller utilizing long-reads generated by Oxford Nanopore Technologies. NanoVar employs split-reads and hard-clipped reads for SV detection and utilizes a neural network classifier for true SV enrichment. In simulated data, NanoVar demonstrated the highest SV detection accuracy (F1 score = 0.91) amongst other long-read SV callers using 12 gigabases (4X) of sequencing data. In patient samples, besides the detection of genomic aberrations, NanoVar also uncovered many normal alternative sequences or alleles which were present in healthy individuals. The low sequencing depth requirements of NanoVar enable the use of Nanopore sequencing for accurate SV characterization at a lower sequencing cost, an approach compatible with clinical studies and large-scale SV-association research.

[1]  Justin Chu,et al.  NanoSim: nanopore sequence read simulator based on statistical characterization , 2016, bioRxiv.

[2]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[3]  A. Magi,et al.  Detection of Genomic Structural Variants from Next-Generation Sequencing Data , 2015, Front. Bioeng. Biotechnol..

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  L. Ding,et al.  novoBreak: local assembly for breakpoint detection in cancer genomes , 2016, Nature Methods.

[6]  Edwin Cuppen,et al.  Mapping and phasing of structural variation in patient genomes using nanopore sequencing , 2017, Nature Communications.

[7]  Russell E. Durrett,et al.  Assembly and diploid architecture of an individual human genome via single-molecule technologies , 2015, Nature Methods.

[8]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[9]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[10]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[11]  Chee Seng Chan,et al.  Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. , 2011, Genome research.

[12]  Martin Dugas,et al.  RSVSim: an R/Bioconductor package for the simulation of structural variations , 2013, Bioinform..

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  T. Ørntoft,et al.  Frequent genomic loss at chr16p13.2 is associated with poor prognosis in colorectal cancer , 2011, International journal of cancer.

[15]  Jan O. Korbel,et al.  Phenotypic impact of genomic structural variation: insights from and for human disease , 2013, Nature Reviews Genetics.

[16]  Tam P. Sneddon,et al.  Long-read genome sequencing identifies causal structural variation in a Mendelian disease , 2017, Genetics in Medicine.

[17]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[18]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[19]  C. Klopp,et al.  Rapid whole-genome based typing and surveillance of avipoxviruses using nanopore sequencing. , 2018, Journal of virological methods.

[20]  Matthew E Hurles,et al.  The functional impact of structural variation in humans. , 2008, Trends in genetics : TIG.

[21]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[22]  B. Vissel,et al.  Human alpha satellite DNA--consensus sequence and conserved regions. , 1987, Nucleic acids research.

[23]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[24]  Ying Chen,et al.  High speed BLASTN: an accelerated MegaBLAST search tool , 2015, Nucleic acids research.

[25]  Andrew R. Webster,et al.  Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing , 2018, Genome Medicine.

[26]  Zhaoshi Jiang,et al.  Characterization of six human disease-associated inversion polymorphisms , 2009, Human molecular genetics.

[27]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[28]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[29]  E. Eichler,et al.  Human copy number polymorphic genes , 2009, Cytogenetic and Genome Research.

[30]  Jordan M. Eizenga,et al.  Genome graphs and the evolution of genome inference , 2017, bioRxiv.

[31]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[32]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[33]  D. Liao,et al.  A Quick, Cost-Free Method of Purification of DNA Fragments from Agarose Gel , 2012, Journal of Cancer.

[34]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[37]  E. Eichler,et al.  Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions , 2010, Nature Methods.

[38]  Bauke Ylstra,et al.  Sequencing Structural Variants in Cancer for Precision Therapeutics. , 2016, Trends in genetics : TIG.

[39]  Harianto Tjong,et al.  Picky Comprehensively Detects High Resolution Structural Variants in Nanopore Long Reads , 2018, Nature Methods.

[40]  Lilia M. Iakoucheva,et al.  Paternally inherited cis-regulatory structural variants are associated with autism , 2018, Science.

[41]  Zhen-yi Wang,et al.  Acute promyelocytic leukemia: from highly fatal to highly curable. , 2008, Blood.

[42]  J. Korlach,et al.  De novo assembly and phasing of a Korean human genome , 2016, Nature.

[43]  Mark T. W. Ebbert,et al.  Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease , 2018, Molecular Neurodegeneration.

[44]  Depeng Wang,et al.  Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing , 2017, Genome Medicine.

[45]  Qi Yang,et al.  Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis , 2018, Hereditas.

[46]  Evan E. Eichler,et al.  Characterizing the Major Structural Variant Alleles of the Human Genome , 2019, Cell.

[47]  L. S. Cram,et al.  A highly conserved repetitive DNA sequence, (TTAGGG)n, present at the telomeres of human chromosomes. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.