Profiling copy number variation and disease associations from 50,726 DiscovEHR Study exomes

Copy number variants (CNVs) are a substantial source of genomic variation and contribute to a wide range of human disorders. Gene-disrupting exonic CNVs have important clinical implications as they can underlie variability in disease presentation and susceptibility. The relationship between exonic CNVs and clinical traits has not been broadly explored at the population level, primarily due to technical challenges. We surveyed common and rare CNVs in the exome sequences of 50,726 adult DiscovEHR study participants with linked electronic health records (EHRs). We evaluated the diagnostic yield and clinical expressivity of known pathogenic CNVs, and performed tests of association with EHR-derived serum lipids, thereby evaluating the relationship between CNVs and complex traits and phenotypes in an unbiased, real-world clinical context. We identified CNVs from megabase to exon-level resolution, demonstrating reliable, high-throughput detection of clinically relevant exonic CNVs. In doing so, we created a catalog of high-confidence common and rare CNVs and refined population frequency estimates of known and novel gene-disrupting CNVs. Our survey among an unselected clinical population provides further evidence that neuropathy-associated duplications and deletions in 17p12 have similar population prevalence but are clinically under-diagnosed. Similarly, adults who harbor 22q11.2 deletions frequently had EHR documentation of neurodevelopmental/neuropsychiatric disorders and congenital anomalies, but not a formal genetic diagnosis (i.e., deletion). In an exome-wide association study of lipid levels, we identified a novel five-exon duplication within LDLR segregating in a large kindred with features of familial hypercholesterolemia. Exonic CNVs provide new opportunities to understand and diagnose human disease.

[1]  Xiaolin Zhu,et al.  An Evaluation of Copy Number Variation Detection Tools from Whole‐Exome Sequencing Data , 2014, Human mutation.

[2]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[3]  Jeffrey Staples,et al.  PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. , 2014, American journal of human genetics.

[4]  Bonnie Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014 .

[5]  Alexander E. Lopez,et al.  Inactivating Variants in ANGPTL4 and Risk of Coronary Artery Disease. , 2016, The New England journal of medicine.

[6]  Marylyn D. Ritchie,et al.  Genetic identification of familial hypercholesterolemia within a single U.S. health care system , 2016, Science.

[7]  R. Handsaker,et al.  Large multi-allelic copy number variations in humans , 2015, Nature Genetics.

[8]  E. Banks,et al.  Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. , 2012, American journal of human genetics.

[9]  Marylyn D. Ritchie,et al.  Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study , 2016, Science.

[10]  Simon White,et al.  Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline , 2014, BMC Bioinformatics.

[11]  K. Devriendt,et al.  Practical guidelines for managing patients with 22q11.2 deletion syndrome. , 2011, The Journal of pediatrics.

[12]  R. Handsaker,et al.  Recurring exon deletions in the haptoglobin ( HP ) gene associate with lower blood cholesterol levels , 2016 .

[13]  Monkol Lek,et al.  Patterns of genic intolerance of rare copy number variation in 59,898 human exomes , 2016, Nature Genetics.

[14]  Celine S. Hong,et al.  Assessing the reproducibility of exome copy number variations predictions , 2016, Genome Medicine.

[15]  Frederick E. Dewey,et al.  CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data , 2015, Bioinform..

[16]  P. Stankiewicz,et al.  The Alu-rich genomic architecture of SPAST predisposes to diverse and functionally distinct disease-associated CNV alleles. , 2014, American journal of human genetics.

[17]  Kali T. Witherspoon,et al.  Refining analyses of copy number variation identifies specific genes associated with developmental delay , 2014, Nature Genetics.

[18]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[19]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[20]  Xin Li,et al.  The impact of structural variation on human gene expression , 2016, Nature Genetics.

[21]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[22]  C. Sismani,et al.  Copy Number Variation in Human Health, Disease and Evolution , 2015 .

[23]  Bradley P. Coe,et al.  Global diversity, population stratification, and selection of human copy-number variation , 2015, Science.

[24]  H. Arase,et al.  Functional and genetic diversity of leukocyte immunoglobulin-like receptor and implication for disease associations , 2015, Journal of Human Genetics.

[25]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[26]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[27]  I. Kullo,et al.  Child-Parent Familial Hypercholesterolemia Screening in Primary Care. , 2017, The New England journal of medicine.

[28]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[29]  B. Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014, Nature Genetics.

[30]  J. Lupski,et al.  Two autosomal dominant neuropathies result from reciprocal DNA duplication/deletion of a region on chromosome 17. , 1994, Human molecular genetics.

[31]  D. Ledbetter,et al.  The Geisinger MyCode Community Health Initiative: an electronic health record-linked biobank for Precision Medicine research , 2015, Genetics in Medicine.

[32]  S E Humphries,et al.  Update and Analysis of the University College London Low Density Lipoprotein Receptor Familial Hypercholesterolemia Database , 2008, Annals of human genetics.