Functional Interpretation of Genetic Variants Using Deep Learning Predicts Impact on Epigenome

Identifying causal variants underling disease risk and adoption of personalized medicine are currently limited by the challenge of interpreting the functional consequences of genetic variants. Predicting the functional effects of disease-associated protein-coding variants is increasingly routine. Yet the vast majority of risk variants are non-coding, and predicting the functional consequence and prioritizing variants for functional validation remains a major challenge. Here we develop a deep learning model to accurately predict locus-specific signals from four epigenetic assays using only DNA sequence as input. Given the predicted epigenetic signal from DNA sequence for the reference and alternative alleles at a given locus, we generate a score of the predicted epigenetic consequences for 438 million variants. These impact scores are assay-specific, are predictive of allele-specific transcription factor binding and are enriched for variants associated with gene expression and disease risk. Nucleotide-level functional consequence scores for non-coding variants can refine the mechanism of known causal variants, identify novel risk variants and prioritize downstream experiments.

[1]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[2]  L. Kedes,et al.  Nomenclature for incompletely specified bases in nucleic acid sequences. Recommendations 1984. Nomenclature Committee of the International Union of Biochemistry (NC-IUB). , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[4]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[5]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[6]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[7]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[8]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[9]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[10]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[11]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[12]  Łukasz M. Boryń,et al.  Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq , 2013, Science.

[13]  Manolis Kellis,et al.  Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments , 2013, Nucleic acids research.

[14]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.

[15]  Matthew Slattery,et al.  Absence of a simple code: how transcription factors read the genome. , 2014, Trends in biochemical sciences.

[16]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[17]  Joseph K. Pickrell Joint analysis of functional genomic data and genome-wide association studies of 18 human traits , 2013, bioRxiv.

[18]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[19]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[20]  Judith B. Zaugg,et al.  Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions , 2015, Cell.

[21]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[22]  J. Barrett,et al.  Strategies for fine-mapping complex traits , 2015, Human molecular genetics.

[23]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[24]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[25]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[26]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[27]  Jens Keilwagen,et al.  PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R , 2015, Bioinform..

[28]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[29]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[30]  O. Delaneau,et al.  Population Variation and Genetic Control of Modular Chromatin Architecture in Humans , 2015, Cell.

[31]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[32]  Michael J. Purcaro,et al.  The PsychENCODE project , 2015, Nature Neuroscience.

[33]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[34]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[35]  Joel Hirschhorn,et al.  SNPsnap: a Web-based tool for identification and annotation of matched SNPs , 2015, Bioinform..

[36]  Manolis Kellis,et al.  FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. , 2015, The New England journal of medicine.

[37]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[38]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[39]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[40]  T. Mikkelsen,et al.  Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions , 2016, Nature Biotechnology.

[41]  Jacob C. Ulirsch,et al.  Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits , 2016, Cell.

[42]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[43]  W. Wasserman,et al.  Evaluating the impact of single nucleotide variants on transcription factor binding , 2016, Nucleic acids research.

[44]  M. Gerstein,et al.  A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals , 2016, Nature Communications.

[45]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[46]  Matthew T. Maurano,et al.  Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells , 2016, Cell.

[47]  Benjamin A. Logsdon,et al.  Gene Expression Elucidates Functional Impact of Polygenic Risk for Schizophrenia , 2016, Nature Neuroscience.

[48]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[49]  B. Deplancke,et al.  The Genetics of Transcription Factor DNA Binding Variation , 2016, Cell.

[50]  Hailiang Huang,et al.  Fine-mapping inflammatory bowel disease loci to single variant resolution , 2017, Nature.

[51]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[52]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[53]  Michael T. McManus,et al.  A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity , 2016, bioRxiv.

[54]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[55]  Omar Wagih,et al.  ggseqlogo: a versatile R package for drawing sequence logos , 2017, Bioinform..

[56]  Maitreya J. Dunham,et al.  Variant Interpretation: Functional Assays to the Rescue. , 2017, American journal of human genetics.

[57]  Gao Wang,et al.  The impact of rare variation on gene expression across tissues , 2016, Nature.

[58]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[59]  Roberto Vera Alvarez,et al.  Quantifying deleterious effects of regulatory variants , 2016, Nucleic acids research.

[60]  Maureen A. Sartor,et al.  annotatr: Genomic regions in context , 2016, bioRxiv.

[61]  T. Spector,et al.  Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues , 2017, Nature Genetics.

[62]  M. O’Donovan,et al.  Pleiotropic effects of trait-associated genetic variation on DNA methylation: utility for refining , 2018 .

[63]  Chandra L. Theesfeld,et al.  Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , 2018, Nature Genetics.

[64]  David R. Kelley,et al.  Sequential regulatory activity prediction across chromosomes with convolutional neural networks. , 2018, Genome research.

[65]  Shan Li,et al.  SNPDelScore: combining multiple methods to score deleterious effects of noncoding mutations in the human genome , 2018, Bioinform..

[66]  Z. Weng,et al.  Cell-specific histone modification maps link schizophrenia risk to the neuronal epigenome , 2018, Nature Neuroscience.

[67]  Pardis C Sabeti,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[68]  Justin K. Huang,et al.  A global transcriptional network connecting noncoding mutations to changes in tumor gene expression , 2018, Nature Genetics.

[69]  T. Hughes,et al.  The Human Transcription Factors , 2018, Cell.

[70]  Ryan L. Collins,et al.  An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder , 2018, Nature Genetics.

[71]  Łukasz M. Boryń,et al.  Resolving systematic errors in widely-used enhancer activity assays in human cells , 2017, Nature Methods.