Leveraging Base Pair Mammalian Constraint to Understand Genetic Variation and Human Disease

Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.

Voichita D. Marinescu | Andreas R. Pfenning | Graham M. Hughes | BaDoi N. Phan | Irene M. Kaplow | Pardis C Sabeti | N. Wray | F. Di Palma | B. Birren | K. Lindblad-Toh | Z. Weng | M. Diekhans | K. Pollard | T. Marquès-Bonet | H. Clawson | B. Paten | O. Wallerman | W. Murphy | R. Hubley | E. Karlsson | E. Teeling | A. Navarro | G. Muntané | Jia Wen | M. Springer | E. Eizirik | Jill E. Moore | J. Szatkiewicz | S. Gazal | B. Shapiro | H. Lewin | Steven K. Reilly | Oliver A. Ryder | J. Zeng | D. Ray | Jason Turner-Maier | C. Steiner | Sharadha Sakthikumar | Jeremy Johnson | K. Fan | J. Meadows | Diana D. Moreno-Santillán | L. Huckins | S. Kozyrev | Zhili Zheng | M. Christmas | Patrick F. Sullivan | K. Koepfli | James R Xue | Morgan E. Wirthlin | Ross Swofford | G. Hickey | Jessica S. Johnson | Abigail L. Lind | Joana Damas | Kathleen Morrill | Nicole M. Foley | J. Gatesy | Alyssa J. Lawler | Joy-El R B Talbot | T. Lehmann | Kathleen C. Keough | Ananya Roy | K. Forsberg-Nilsson | D. Genereux | Xue Li | Chaitanya Srinivasan | E. Sundström | Daniel E. Schäffer | David Juan | S. Yao | Gregory R. Andrews | M. Nweeia | B. Kirilenko | S. Ortmann | Arian F. A. Smit | Aryn P. Wilder | Aitor Serres | J. Nordin | Xue Li | Juehan Wang | Quan Sun | P. Sullivan | Jiawen Chen | Chao Wang | I. Ruf | A. Valenzuela | Jessica M. Storer | M. Bianchi | Amanda Kowalczyk | Yun Li | C. Lawless | D. Levesque | Xiaomeng Zhang | Wynn K. Meyer | Jeb Rosen | A. Breit | Victor C. Mason | Andrew J. Harris | K. Bredemeyer | Nicole S. Paulat | Austin B. Osmanski | Q. Sun | Michael Hiller | L. R. Moreira | Megan A. Supple | J. Korstian | Franziska Wagner | Ava Mackay-Smith | Jenna R. Grimshaw | Michaela K. Halsey | Kevin A. M. Sullivan | H. Pratt | Allyson Hindle | Louise Ryan | Linda Goodman | Michael X. Dong | Joel C. Armstrong | Ananya Roy | James R. Xue | Gregory Andrews | Cornelia Fanter | Carlos J. Garcia | Klaus‐Peter Koepfli | Graham M. Hughes | Jennifer M. Korstian | Jeremy Johnson | Tomàs Marquès-Bonet | Shuyang Yao | Bogdan M. Kirilenko | A. Pfenning | Jian Zeng | Laura M. Huckins | Ana M. Breit

[1]  E. Eichler,et al.  A cis-acting structural variation at the ZNF558 locus controls a gene regulatory network in human brain development. , 2021, Cell stem cell.

[2]  D. MacArthur,et al.  From variant to function in human disease genetics , 2021, Science.

[3]  F. Hormozdiari,et al.  Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity , 2021, medRxiv.

[4]  Manolis Kellis,et al.  Regulatory genomic circuitry of human disease loci by integrative epigenomics , 2021, Nature.

[5]  David K. Yang,et al.  Genome-wide functional screen of 3′UTR variants uncovers causal variants for human disease and evolution , 2021, Cell.

[6]  Anushya Muruganujan,et al.  The Gene Ontology resource: enriching a GOld mine , 2020, Nucleic Acids Res..

[7]  David Haussler,et al.  The UCSC Genome Browser database: 2021 update , 2020, Nucleic Acids Res..

[8]  Evgeny M. Zdobnov,et al.  OrthoDB in 2020: evolutionary and functional annotations of orthologs , 2020, Nucleic Acids Res..

[9]  Katherine M. Siewert,et al.  Population-specific causal disease effect sizes in functionally important regions impacted by selection , 2019, Nature Communications.

[10]  Brian E. Cade,et al.  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2019, Nature.

[11]  Voichita D. Marinescu,et al.  A comparative genomics multitool for scientific discovery and conservation , 2020, Nature.

[12]  D. Dickel,et al.  Loss of Extreme Long-Range Enhancers in Human Neural Crest Drives a Craniofacial Disorder , 2020, Cell stem cell.

[13]  Y. Gilad,et al.  Where Are the Disease-Associated eQTLs? , 2020, Trends in genetics : TIG.

[14]  Xin-Qiu Yao,et al.  The Bio3D packages for structural bioinformatics , 2020, Protein science : a publication of the Protein Society.

[15]  Judith B. Zaugg,et al.  Landscape of cohesin-mediated chromatin loops in the human genome , 2020, Nature.

[16]  Andrew A. Hardigan,et al.  Occupancy maps of 208 chromatin-associated proteins in one human cell type , 2020, Nature.

[17]  Michael J. Purcaro,et al.  Expanded encyclopaedias of DNA elements in the human and mouse genomes , 2020, Nature.

[18]  Rebekah A. Oomen,et al.  A Roadmap for Understanding the Evolutionary Significance of Structural Genomic Variation. , 2020, Trends in ecology & evolution.

[19]  Voichita D. Marinescu,et al.  Whole-genome sequencing of glioblastoma reveals enrichment of non-coding constraint mutations in known and novel genes , 2020, Genome Biology.

[20]  Jonathan M. Mudge,et al.  Transcript expression-aware annotation improves rare variant interpretation , 2020, Nature.

[21]  Dan Zhang,et al.  Construction of a human cell landscape at single-cell level , 2020, Nature.

[22]  Mark I. McCarthy,et al.  A brief history of human disease genetics , 2020, Nature.

[23]  M. Campanella,et al.  Species‐specific consequences of an E40K missense mutation in superoxide dismutase 1 (SOD1) , 2019, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[24]  Matti Pirinen,et al.  Functionally-informed fine-mapping and polygenic localization of complex trait heritability , 2019, Nature Genetics.

[25]  Christopher D. Brown,et al.  The GTEx Consortium atlas of genetic regulatory effects across human tissues , 2019, Science.

[26]  F. Sedlazeck,et al.  Structural variant calling: the long and the short of it , 2019, Genome Biology.

[27]  Michael Hahsler,et al.  dbscan: Fast Density-Based Clustering with R , 2019, Journal of Statistical Software.

[28]  Jay Shendure,et al.  Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution , 2019, Nature Communications.

[29]  A. Price,et al.  Reconciling S-LDSC and LDAK functional enrichment estimates , 2019, Nature Genetics.

[30]  Hagen U. Tilgner,et al.  SynGO: An Evidence-Based, Expert-Curated Knowledge Base for the Synapse , 2019, Neuron.

[31]  Y. Bossé,et al.  Benefits and limitations of genome-wide association studies , 2019, Nature Reviews Genetics.

[32]  Michael J. Gloudemans,et al.  Abundant associations with gene expression complicate GWAS follow-up , 2019, Nature Genetics.

[33]  Naomi R. Latorraca,et al.  Structural and functional characterization of G protein–coupled receptors with deep mutational scanning , 2019, bioRxiv.

[34]  Lincoln D. Stein,et al.  The International Cancer Genome Consortium Data Portal , 2019, Nature Biotechnology.

[35]  D. Geschwind,et al.  Defining the Genetic, Genomic, Cellular, and Diagnostic Architectures of Psychiatric Disorders , 2019, Cell.

[36]  Ira M. Hall,et al.  Genomic Analysis in the Age of Human Genome Sequencing , 2019, Cell.

[37]  Björn Olsson,et al.  Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes , 2019, Bioinform..

[38]  Alan F. Scott,et al.  OMIM.org: leveraging knowledge across phenotype–gene relationships , 2018, Nucleic Acids Res..

[39]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[40]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[41]  Matthew Stephens,et al.  A simple new approach to variable selection in regression, with application to genetic fine-mapping , 2018, bioRxiv.

[42]  Prashant S. Emani,et al.  Comprehensive functional genomic resource and integrative model for the human brain , 2018, Science.

[43]  F. Hormozdiari,et al.  Disease heritability enrichment of regulatory elements is concentrated in elements with ancient sequence age and conserved function across species , 2018, bioRxiv.

[44]  A. Price,et al.  Functional architecture of low-frequency variants highlights strength of negative selection across coding and noncoding annotations , 2018, Nature Genetics.

[45]  Annie W Shieh,et al.  Evaluation of chromatin accessibility in prefrontal cortex of individuals with schizophrenia , 2018, Nature Communications.

[46]  Y. Hurd,et al.  An atlas of chromatin accessibility in the adult human brain , 2018, Genome research.

[47]  A. Price,et al.  Mixed-model association for biobank-scale datasets , 2018, Nature Genetics.

[48]  A. Chen-Plotkin,et al.  The Post-GWAS Era: From Association to Function. , 2018, American journal of human genetics.

[49]  Yakir A Reshef,et al.  Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits , 2018, Nature Genetics.

[50]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[51]  D. Geschwind,et al.  The Dynamic Landscape of Open Chromatin during Human Cortical Neurogenesis , 2018, Cell.

[52]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[53]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[54]  Roland Eils,et al.  The whole-genome landscape of medulloblastoma subtypes , 2017, Nature.

[55]  J. Dudley,et al.  Open chromatin profiling of human postmortem brain infers functional roles for non‐coding schizophrenia loci , 2017, Human molecular genetics.

[56]  Wei Cheng,et al.  Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects , 2016, Nature Genetics.

[57]  B. Neale,et al.  Linkage disequilibrium dependent architecture of human complex traits reveals action of negative selection , 2016, bioRxiv.

[58]  John P. Overington,et al.  The druggable genome and support for target identification and validation in drug development , 2016, Science Translational Medicine.

[59]  Miquéias Lopes-Pacheco,et al.  CFTR Modulators: Shedding Light on Precision Medicine for Cystic Fibrosis , 2016, Front. Pharmacol..

[60]  Eric S. Lander,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[61]  G. Reifenberger,et al.  The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary , 2016, Acta Neuropathologica.

[62]  F. Milagro,et al.  FTO Obesity Variant and Adipocyte Browning in Humans. , 2016, The New England journal of medicine.

[63]  M. Leow,et al.  FTO Obesity Variant and Adipocyte Browning in Humans. , 2016, The New England journal of medicine.

[64]  Adam Kiezun,et al.  Exome Aggregation Consortium , 2016 .

[65]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[66]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[67]  Ayal B. Gussow,et al.  The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity , 2015, PLoS genetics.

[68]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[69]  J. R. MacDonald,et al.  A copy number variation map of the human genome , 2015, Nature Reviews Genetics.

[70]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[71]  David Haussler,et al.  Alignathon: a competitive assessment of whole-genome alignment methods , 2014, bioRxiv.

[72]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[73]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[74]  R. Beroukhim,et al.  BET Bromodomain Inhibition of MYC-Amplified Medulloblastoma , 2013, Clinical Cancer Research.

[75]  David Haussler,et al.  HAL: a hierarchical format for storing and analyzing multiple genome alignments , 2013, Bioinform..

[76]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[77]  K. Katoh,et al.  Improvements in Performance and Usability , 2013 .

[78]  Hergen Spits,et al.  The transcription factor GATA3 is essential for the function of human type 2 innate lymphoid cells. , 2012, Immunity.

[79]  J. Dekker,et al.  The long-range interaction landscape of gene promoters , 2012, Nature.

[80]  M. Kool,et al.  EZH2-Regulated DAB2IP Is a Medulloblastoma Tumor Suppressor and a Positive Marker for Survival , 2012, Clinical Cancer Research.

[81]  M. Marazita,et al.  Genome-wide Association Studies , 2012, Journal of dental research.

[82]  Helge G. Roider,et al.  Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs , 2011, Nature Protocols.

[83]  Keji Zhao,et al.  GATA3 controls Foxp3⁺ regulatory T cell fate during inflammation in mice. , 2011, The Journal of clinical investigation.

[84]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[85]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[86]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[87]  F. Kondrashov,et al.  The evolution of gene duplications: classifying and distinguishing between models , 2010, Nature Reviews Genetics.

[88]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[89]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[90]  Richard H Anderson,et al.  Department of Psychiatry , 2010 .

[91]  David Haussler,et al.  New Methods for Detecting Lineage-Specific Selection , 2006, RECOMB.

[92]  W. Paul,et al.  GATA-3 promotes Th2 responses through three different mechanisms: induction of Th2 cytokine production, selective growth of Th2 cells and inhibition of Th1 cell-specific factors , 2006, Cell Research.

[93]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[94]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[95]  John D. Storey A direct approach to false discovery rates , 2002 .

[96]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[97]  R. Swingler,et al.  Identification of a novel SOD1 mutation in an apparently sporadic amyotrophic lateral sclerosis patient and the detection of Ile113Thr in three others. , 1994, Human molecular genetics.

[98]  V. French,et al.  The long and the short of it , 1993, Nature.

[99]  J. Sneep,et al.  With a summary , 1945 .