Clinical detection of deletion structural variants in whole-genome sequences

Optimal management of acutely ill infants with monogenetic diseases requires rapid identification of causative haplotypes. Whole-genome sequencing (WGS) has been shown to identify pathogenic nucleotide variants in such infants. Deletion structural variants (DSVs, >50 nt) are implicated in many genetic diseases, and tools have been designed to identify DSVs using short-read WGS. Optimisation and integration of these tools into a WGS pipeline could improve diagnostic sensitivity and specificity of WGS. In addition, it may improve turnaround time when compared with current CNV assays, enhancing utility in acute settings. Here we describe DSV detection methods for use in WGS for rapid diagnosis in acutely ill infants: SKALD (Screening Konsensus and Annotation of Large Deletions) combines calls from two tools (Breakdancer and GenomeStrip) with calibrated filters and clinical interpretation rules. In four WGS runs, the average analytic precision (positive predictive value) of SKALD was 78%, and recall (sensitivity) was 27%, when compared with validated reference DSV calls. When retrospectively applied to a cohort of 36 families with acutely ill infants SKALD identified causative DSVs in two. The first was heterozygous deletion of exons 1–3 of MMP21 in trans with a heterozygous frame-shift deletion in two siblings with transposition of the great arteries and heterotaxy. In a newborn female with dysmorphic features, ventricular septal defect and persistent pulmonary hypertension, SKALD identified the breakpoints of a heterozygous, de novo 1p36.32p36.13 deletion. In summary, consensus DSV calling, implemented in an 8-h computational pipeline with parameterised filtering, has the potential to increase the diagnostic yield of WGS in acutely ill neonates and discover novel disease genes.

[1]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[2]  Noah Spies,et al.  svviz: a read viewer for validating structural variants , 2015, bioRxiv.

[3]  Laurie D. Smith,et al.  Whole-genome sequencing for identification of Mendelian disorders in critically ill infants: a retrospective analysis of diagnostic and clinical findings. , 2015, The Lancet. Respiratory medicine.

[4]  N. Carter,et al.  Germline rates of de novo meiotic deletions and duplications causing several genomic disorders , 2008, Nature Genetics.

[5]  J. Goldblatt,et al.  The Impact of Single Gene and Chromosomal Disorders on Hospital Admissions of Children and Adolescents: A Population-Based Study , 2010, Public Health Genomics.

[6]  Peter Saffrey,et al.  Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units , 2012, Science Translational Medicine.

[7]  J. Lupski,et al.  Genomic rearrangements and sporadic disease , 2007, Nature Genetics.

[8]  T. Krings,et al.  A new case of proximal monosomy 1p36, extending the phenotype , 2008, American journal of medical genetics. Part A.

[9]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[10]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[11]  Bernard Guyer,et al.  Annual Summary of Vital Statistics: 2009 , 2012, Pediatrics.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[14]  Laurie D. Smith,et al.  MMP21 is mutated in human heterotaxy and is required for normal left-right asymmetry in vertebrates , 2015, Nature Genetics.

[15]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[16]  S. McCandless,et al.  The burden of genetic disease on inpatient care in a children's hospital. , 2004, American journal of human genetics.

[17]  S. Scherer,et al.  Contemplating effects of genomic structural variation , 2008, Genetics in Medicine.

[18]  D. Conrad,et al.  A high-resolution survey of deletion polymorphism in the human genome , 2006, Nature Genetics.

[19]  Alejandro A. Schäffer,et al.  A Fast and Symmetric DUST Implementation to Mask Low-Complexity DNA Sequences , 2006, J. Comput. Biol..

[20]  Toshiyuki Yamamoto,et al.  Proximal interstitial 1p36 deletion syndrome: The most proximal 3.5-Mb microdeletion identified on a dysmorphic and mentally retarded patient with inv(3)(p14.1q26.2) , 2009, Brain and Development.

[21]  A. Strongin,et al.  The structure and regulation of the human and mouse matrix metalloproteinase-21 gene and protein. , 2003, The Biochemical journal.

[22]  G van den Engh,et al.  Large multi-chromosomal duplications encompass many members of the olfactory receptor gene family in the human genome. , 1998, Human molecular genetics.

[23]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[24]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[25]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[26]  J. Lupski,et al.  Human genome sequencing in health and disease. , 2012, Annual review of medicine.

[27]  P. Giampietro,et al.  Prevalence and patterns of presentation of genetic disorders in a pediatric emergency department. , 2001, Mayo Clinic proceedings.

[28]  Christoph Lehmann,et al.  Application and comparison of classification algorithms for recognition of Alzheimer's disease in electrical brain activity (EEG) , 2007, Journal of Neuroscience Methods.

[29]  Laurie D. Smith,et al.  A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases , 2015, Genome Medicine.

[30]  Ryan E. Mills,et al.  Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing , 2010, Nature Genetics.

[31]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[32]  R. Wiest,et al.  Contiguous ∼16 Mb 1p36 deletion: Dominant features of classical distal 1p36 monosomy with haplo‐lethality , 2011, American journal of medical genetics. Part A.

[33]  Alexey S Kondrashov,et al.  Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases , 2003, Human mutation.

[34]  João Maroco,et al.  Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests , 2011, BMC Research Notes.

[35]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[36]  M. Khoury,et al.  Contribution of birth defects to infant mortality among racial/ethnic minority groups, United States, 1983. , 1990, MMWR. CDC surveillance summaries : Morbidity and mortality weekly report. CDC surveillance summaries.

[37]  Z. Ou,et al.  Identification of proximal 1p36 deletions using array‐CGH: a possible new syndrome , 2007, Clinical genetics.

[38]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[39]  E. Lander Initial impact of the sequencing of the human genome , 2011, Nature.

[40]  P. Stankiewicz,et al.  Genomic Imbalances in Neonates With Birth Defects: High Detection Rates by Using Chromosomal Microarray Analysis , 2008, Pediatrics.

[41]  Xuan Yuan,et al.  Effectiveness of exome and genome sequencing guided by acuity of illness for diagnosis of neurodevelopmental disorders , 2014, Science Translational Medicine.

[42]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[43]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[44]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[45]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[46]  E. Eichler,et al.  Systematic assessment of copy number variant detection via genome-wide SNP genotyping , 2008, Nature Genetics.

[47]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[48]  Agus Salim,et al.  Statistical challenges associated with detecting copy number variations with next-generation sequencing , 2012, Bioinform..

[49]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[50]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[51]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[52]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[53]  Seungbok Lee,et al.  Copy Number Variation of Age-Related Macular Degeneration Relevant Genes in the Korean Population , 2012, PloS one.

[54]  F. Hauck,et al.  Racial and ethnic disparities in infant mortality. , 2011, Seminars in perinatology.

[55]  J. Rosenfeld,et al.  Refinement of causative genes in monosomy 1p36 through clinical and molecular cytogenetic characterization of small interstitial deletions , 2010, American journal of medical genetics. Part A.

[56]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[57]  A. Valsesia,et al.  The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation , 2013, Front. Genet..

[58]  P. Visscher,et al.  Rare chromosomal deletions and duplications increase risk of schizophrenia , 2008, Nature.

[59]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[60]  Ney Alliey-Rodriguez,et al.  Accuracy of CNV Detection from GWAS Data , 2011, PloS one.