Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study

Unleashing the power of precision medicine Precision medicine promises the ability to identify risks and treat patients on the basis of pathogenic genetic variation. Two studies combined exome sequencing results for over 50,000 people with their electronic health records. Dewey et al. found that ∼3.5% of individuals in their cohort had clinically actionable genetic variants. Many of these variants affected blood lipid levels that could influence cardiovascular health. Abul-Husn et al. extended these findings to investigate the genetics and treatment of familial hypercholesterolemia, a risk factor for cardiovascular disease, within their patient pool. Genetic screening helped identify at-risk patients who could benefit from increased treatment. Science, this issue p. 10.1126/science.aaf6814, p. 10.1126/science.aaf7000 More than 50,000 exomes, coupled with electronic health records, inform on medically relevant genetic variants. INTRODUCTION Large-scale genetic studies of integrated health care populations, with phenotypic data captured natively in the documentation of clinical care, have the potential to unveil genetic associations that point the way to new biology and therapeutic targets. This setting also represents an ideal test bed for the implementation of genomics in routine clinical care in service of precision medicine. RATIONALE The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System aims to catalyze genomic discovery and precision medicine by coupling high-throughput exome sequencing to longitudinal electronic health records (EHRs) of participants in Geisinger’s MyCode Community Health Initiative. Here, we describe initial insights from whole-exome sequencing of 50,726 adult participants of predominantly European ancestry using clinical phenotypes derived from EHRs. RESULTS The median duration of EHR data associated with sequenced participants was 14 years, with a median of 87 clinical encounters, 687 laboratory tests, and seven procedures per participant. Forty-eight percent of sequenced individuals had one or more first- or second-degree relatives in the sample, and genome-wide autozygosity was similar to other outbred European populations. We found ~4.2 million single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in loss of gene function (LoF). The overwhelming majority of these genetic variants occurred at a minor allele frequency of ≤1%, and more than half were singletons. Each participant harbored a median of 21 rare predicted LoFs. At this sample size, ~92% of sequenced genes, including genes that encode existing drug targets or confer risk for highly penetrant genetic diseases, harbor rare heterozygous predicted LoF variants. About 7% of sequenced genes contained rare homozygous predicted LoF variants in at least one individual. Linking these data to EHR-derived laboratory phenotypes revealed consequences of partial or complete LoF in humans. Among these were previously unidentified associations between predicted LoFs in CSF2RB and basophil and eosinophil counts, and EGLN1-associated erythrocytosis segregating in genetically identified family networks. Using predicted LoFs as a model for drug target antagonism, we found associations supporting the majority of therapeutic targets for lipid lowering. To highlight the opportunity for genotype-phenotype association discovery, we performed exome-wide association analyses of EHR-derived lipid values, newly implicating rare predicted LoFs, and deleterious missense variants in G6PC in association with triglyceride levels. In a survey of 76 clinically actionable disease-associated genes, we estimated that 3.5% of individuals harbor pathogenic or likely pathogenic variants that meet criteria for clinical action. Review of the EHR uncovered findings associated with the monogenic condition in ~65% of pathogenic variant carriers’ medical records. CONCLUSION The findings reported here demonstrate the value of large-scale sequencing in an integrated health system population, add to the knowledge base regarding the phenotypic consequences of human genetic variation, and illustrate the challenges and promise of genomic medicine implementation. DiscovEHR provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic target discovery. Therapeutic target validation and genomic medicine in DiscovEHR. (A) Associations between predicted LoF variants in lipid drug target genes and lipid levels. Boxes correspond to effect size, given as the absolute value of effect, in SD units; whiskers denote 95% confidence intervals for effect. The size of the box is proportional to the logarithm (base 10) of predicted LoF carriers. (B and C) Prevalence and expressivity of clinically actionable genetic variants in 76 disease genes, according to EHR data. G76, Geisinger-76. The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System couples high-throughput sequencing to an integrated health care system using longitudinal electronic health records (EHRs). We sequenced the exomes of 50,726 adult participants in the DiscovEHR study to identify ~4.2 million rare single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in a loss of gene function. Linking these data to EHR-derived clinical phenotypes, we find clinical associations supporting therapeutic targets, including genes encoding drug targets for lipid lowering, and identify previously unidentified rare alleles associated with lipid levels and other blood level traits. About 3.5% of individuals harbor deleterious variants in 76 clinically actionable genes. The DiscovEHR data set provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic discovery.

Marylyn D. Ritchie | Scott Mellis | Alan R. Shuldiner | Matthew S. Lebo | Jeffrey Staples | Joseph B. Leader | Daniel R. Lavage | Cristopher V. Van Hout | Frederick E. Dewey | Alexander E. Lopez | John D. Overton | David J. Carey | H. Lester Kirchner | Sarah A. Pendergrass | Jeffrey S. Reid | Ingrid B. Borecki | Raghu Metpally | Lukas Habegger | Suganthi Balasubramanian | Thomas N. Person | Noura S. Abul-Husn | Alexander Hanbo Li | Jonathan S. Packer | Omri Gottesman | Anthony Marcketta | Aris Baras | Dustin N. Hartzel | Christopher D. Still | F. Daniel Davis | David H. Ledbetter | Heather Mason-Suares | Andrew J. Murphy | Nehal Gosalia | Robert H. Phillips | D. Ledbetter | S. Balasubramanian | F. Dewey | I. Borecki | C. O'Dushlaine | J. Reid | L. Mitnaul | M. Ritchie | S. Pendergrass | L. Habegger | A. Murphy | O. Gottesman | C. Gonzaga-Jauregui | A. Shuldiner | H. Kirchner | M. Murray | G. Yancopoulos | J. Overton | N. Stahl | D. Carey | T. Person | Semanti Mukherjee | M. Lebo | J. Elmore | N. Abul-Husn | D. Hartzel | A. Baras | J. Leader | Samantha N. Fetterolf | C. Hout | J. Staples | R. Metpally | Monica A. Giovanni | Korey A. Kost | J. Penn | Nehal Gosalia | Manoj Kanagaraj | Lance J. Adams | K. Praveen | A. Marcketta | C. Austin-Tse | H. Mason-Suares | S. Bruse | S. Mellis | R. Phillips | A. Economides | K. Skelding | C. Still | F. D. Davis | W. Faucett | Lyndon J. Mitnaul | Neil Stahl | Aris N. Economides | George D. Yancopoulos | Kimberly A. Skelding | Claudia Gonzaga-Jauregui | Michael F. Murray | Colm O'Dushlaine | Alexander H. Li | Kavita Praveen | James R. Elmore | John S. Penn | Semanti Mukherjee | Manoj Kanagaraj | Christina Austin-Tse | Shannon Bruse | William A. Faucett | C. Gonzaga‐Jauregui | John Penn | C. O’Dushlaine

[1]  Jeffrey Staples,et al.  PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. , 2014, American journal of human genetics.

[2]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[3]  Trevor J Pugh,et al.  A systematic approach to assessing the clinical significance of genetic variants , 2013, Clinical genetics.

[4]  Efficient Bayesian mixed-model analysis increases association power in large cohorts , 2015 .

[5]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[6]  H. Stefánsson,et al.  Identification of a large set of rare complete human knockouts , 2015, Nature Genetics.

[7]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[8]  Alexander E. Lopez,et al.  Inactivating Variants in ANGPTL4 and Risk of Coronary Artery Disease. , 2016, The New England journal of medicine.

[9]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[10]  F. Welty Hypobetalipoproteinemia and abetalipoproteinemia , 2014, Current opinion in lipidology.

[11]  Alexander Pertsemlidis,et al.  Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9 , 2005, Nature Genetics.

[12]  C. Piantadosi,et al.  Isolation of Mycobacteria in Patients with Pulmonary Alveolar Proteinosis , 1994, Medicine.

[13]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[14]  F. Galateau-Sallé,et al.  PHD2 mutation and congenital erythrocytosis with paraganglioma. , 2008, The New England journal of medicine.

[15]  Gail Clement,et al.  A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans , 2014, Nature Communications.

[16]  M. Daly,et al.  Searching for missing heritability: Designing rare variant association studies , 2014, Proceedings of the National Academy of Sciences.

[17]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[18]  I. Tikhonova,et al.  Genetic diagnosis by whole exome capture and massively parallel DNA sequencing , 2009, Proceedings of the National Academy of Sciences.

[19]  Kosuke M. Teshima,et al.  Natural Selection on Genes that Underlie Human Disease Susceptibility , 2008, Current Biology.

[20]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[21]  H. Stefánsson,et al.  Loss-of-function variants in ABCA7 confer risk of Alzheimer's disease , 2015, Nature Genetics.

[22]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[23]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[24]  M. Bucan,et al.  From Mouse to Human: Evolutionary Genomics Analysis of Human Orthologs of Essential Genes , 2013, PLoS genetics.

[25]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[26]  W. Cromwell,et al.  Mipomersen, an apolipoprotein B synthesis inhibitor, reduces atherogenic lipoproteins in patients with severe hypercholesterolemia at high cardiovascular risk: a randomized, double-blind, placebo-controlled trial. , 2013, Journal of the American College of Cardiology.

[27]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2016 .

[28]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[29]  Thomas R. Riley,et al.  A Randomized Double-blind Placebo-controlled Trial , 2004 .

[30]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[31]  B. Nordestgaard,et al.  Loss-of-function mutations in APOC3 and risk of ischemic vascular disease. , 2014, The New England journal of medicine.

[32]  Jennifer G. Robinson,et al.  Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. , 2014, American journal of human genetics.

[33]  G. Kirov,et al.  Population structure and genome-wide patterns of variation in Ireland and Britain , 2010, European Journal of Human Genetics.

[34]  B. Halliwell,et al.  Metal ion release from mechanically-disrupted human arterial wall. Implications for the development of atherosclerosis. , 1995, Free radical research.

[35]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[36]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[37]  M. Fornage,et al.  Gene-Centric Meta-Analysis of Lipid Traits in African, East Asian and Hispanic Populations , 2012, PloS one.

[38]  Marylyn D. Ritchie,et al.  Genetic identification of familial hypercholesterolemia within a single U.S. health care system , 2016, Science.

[39]  Xiaofeng Zhu,et al.  ARTICLE Genome-wide Characterization of Shared and Distinct Genetic Components that Influence Blood Lipid Levels in Ethnically Diverse Human Populations , 2022 .

[40]  K. Moore,et al.  Scavenger receptor CD36 mediates uptake of high density lipoproteins in mice and by cultured cells[S] , 2011, Journal of Lipid Research.

[41]  Jennifer G. Robinson,et al.  Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol. , 2014, American journal of human genetics.

[42]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[43]  Magalie S Leduc,et al.  Molecular findings among patients referred for clinical whole-exome sequencing. , 2014, JAMA.

[44]  He Zhang,et al.  Loss-of-function mutations in APOC3, triglycerides, and coronary disease. , 2014, The New England journal of medicine.

[45]  S. Kathiresan A PCSK9 missense variant associated with a reduced risk of early-onset myocardial infarction. , 2008, The New England journal of medicine.

[46]  Dan M Roden,et al.  A rare variant in MYH6 is associated with high risk of sick sinus syndrome , 2011, Nature Genetics.

[47]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[48]  T. Nagayasu,et al.  Adult-onset hereditary pulmonary alveolar proteinosis caused by a single-base deletion in CSF2RB , 2010, Journal of Medical Genetics.

[49]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[50]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[51]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[52]  James P Evans,et al.  An informatics approach to analyzing the incidentalome , 2012, Genetics in Medicine.

[53]  Marc S. Williams,et al.  ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing , 2013, Genetics in Medicine.

[54]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[55]  S. Barham,et al.  Pulmonary alveolar phospholipoproteinosis: experience with 34 cases and a review. , 1987, Mayo Clinic proceedings.

[56]  G. Backer Faculty Opinions recommendation of Efficacy and safety of cholesterol-lowering treatment: prospective meta-analysis of data from 90,056 participants in 14 randomised trials of statins. , 2006 .

[57]  Yuan-Tsong Chen,et al.  Type I glycogen storage diseases: disorders of the glucose-6-phosphatase complex. , 2002, Current molecular medicine.

[58]  M. McMullin,et al.  A family with erythrocytosis establishes a role for prolyl hydroxylase domain protein 2 in oxygen homeostasis. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Bjarni V. Halldórsson,et al.  Large-scale whole-genome sequencing of the Icelandic population , 2015, Nature Genetics.

[60]  Nazneen Rahman,et al.  Realizing the promise of cancer predisposition genes , 2014, Nature.

[61]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[62]  Abstr Act,et al.  Inactivating Mutations in NPC1L1 and Protection from Coronary Heart Disease , 2014 .

[63]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[64]  Kylie J. Ralston,et al.  CD36 is a receptor for oxidized high density lipoprotein: Implications for the development of atherosclerosis , 2007, FEBS letters.

[65]  Simon White,et al.  Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline , 2014, BMC Bioinformatics.

[66]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[67]  D. Ledbetter,et al.  The Geisinger MyCode Community Health Initiative: an electronic health record-linked biobank for Precision Medicine research , 2015, Genetics in Medicine.

[68]  David M. Herrington,et al.  Multiple rare alleles at LDLR and APOA5 confer risk for early-onset myocardial infarction , 2014, Nature.

[69]  J. O’Connell,et al.  A Null Mutation in Human APOC3 Confers a Favorable Plasma Lipid Profile and Apparent Cardioprotection , 2008, Science.

[70]  R. Collins,et al.  Efficacy and safety of cholesterol-lowering treatment: prospective meta-analysis of data from 90 056 participants in 14 randomised trials of statins , 2005, The Lancet.

[71]  H. Handa,et al.  Cloning and characterization of the human interleukin-3 (IL-3)/IL-5/ granulocyte-macrophage colony-stimulating factor receptor betac gene: regulation by Ets family members. , 1998, Blood.

[72]  G. A. Moore,et al.  randomised double blind placebo controlled trial , 2022 .

[73]  C. Couture,et al.  Hereditary pulmonary alveolar proteinosis caused by recessive CSF2RB mutations , 2011, European Respiratory Journal.

[74]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[75]  C McRae,et al.  Myocardial infarction. , 2019, Australian family physician.

[76]  Andres Metspalu,et al.  Distribution and Medical Impact of Loss-of-Function Variants in the Finnish Founder Population , 2014, PLoS genetics.

[77]  Børge G Nordestgaard,et al.  PCSK9 R46L, low-density lipoprotein cholesterol levels, and risk of ischemic heart disease: 3 independent studies and meta-analyses. , 2010, Journal of the American College of Cardiology.

[78]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[79]  Harry Hemingway,et al.  Health and population effects of rare gene knockouts in adult humans with related parents , 2015, Science.

[80]  D. Gaudet,et al.  Mipomersen, an apolipoprotein B synthesis inhibitor, for lowering of LDL cholesterol concentrations in patients with homozygous familial hypercholesterolaemia: a randomised, double-blind, placebo-controlled trial , 2010, The Lancet.

[81]  Jana Marie Schwarz,et al.  MutationTaster2: mutation prediction for the deep-sequencing age , 2014, Nature Methods.

[82]  M. Bamshad,et al.  Characteristics of neutral and deleterious protein-coding variation among individuals and populations. , 2014, American journal of human genetics.

[83]  Eric Boerwinkle,et al.  Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease , 2015, Nature Genetics.