DeepWAS : Directly integrating regulatory information into GWAS using 1 deep learning supports master regulator MEF 2 C as risk factor for major 2 depressive disorder 3 4

Background Genome-wide association studies (GWAS) identify genetic variants predictive of common diseases but this does not directly inform on molecular mechanisms. The recently developed deep learning-based method DeepSEA uses DNA sequences to predict regulatory effects for up to 1000 functional units, namely regulatory elements and chromatin features in specific cell-types from the ENCODE project. Results We here describe “DeepWAS”, a conceptually new GWAS approach that integrates these predictions to identify SNP sets per functional units prior to association analysis based on multiple regression. To test the power of this approach, we use genotype data from a major depressive disorder (MDD) case/control sample (total N=1,537). DeepWAS identified 177 regulatory SNPs moderating 122 functional units. MDD regulatory SNPs were located mostly in promoters, intronic and distal intergenic regions and validated with public data. Blood regulatory SNPs were experimentally annotated with methylation quantitative trait loci (QTLs), expression quantitative trait methylation loci and expression QTLs and replicated in an independent cohort. Joint integrative analysis of regulatory SNPs and the independently identified annotations were connected through transcription factors MEF2A, MEF2C and ATF2, regulating a network of transcripts previously linked to other psychiatric disorders. In the latest GWAS for MDD, the MEF2C gene itself is within the top genome-wide significant locus. Conclusions DeepWAS is a novel concept with the power to directly identify individual regulatory SNPs from genotypes. In a proof of concept study, MEF2C was identified as a master-regulator in major depression, a finding complementary to recent depression GWAS data, underlining the power of DeepWAS.

[1]  Christian Gieger,et al.  Connecting genetic risk to disease end points through the human blood plasma proteome , 2016, Nature Communications.

[2]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[3]  Florian Hahne,et al.  Visualizing Genomic Data Using Gviz and Bioconductor , 2016, Statistical Genomics.

[4]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[5]  W. Huber,et al.  Model-based variance-stabilizing transformation for Illumina microarray data , 2008, Nucleic acids research.

[6]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[7]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[8]  K. Fassbender,et al.  Sphingolipids: Important Players in Multiple Sclerosis , 2014, Cellular Physiology and Biochemistry.

[9]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[10]  Y. Kishimoto,et al.  Saposin proteins: structure, function, and role in human lysosomal storage disorders , 1991, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[11]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[12]  Andre Altmann,et al.  Re-Annotator: Annotation Pipeline for Microarray Probe Sequences , 2015, PloS one.

[13]  W. Rathmann,et al.  Cohort profile: the study of health in Pomerania. , 2011, International journal of epidemiology.

[14]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[15]  D. Grozeva,et al.  Identification of a CACNA2D4 deletion in late onset bipolar disorder patients and implications for the involvement of voltage‐dependent calcium channels in psychiatric disorders , 2012, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[16]  E. Gamazon,et al.  Enrichment of Cis-Regulatory Gene Expression SNPs and Methylation Quantitative Trait Loci Among Bipolar Disorder Susceptibility Variants , 2012, Molecular Psychiatry.

[17]  Rui Mei,et al.  Type I interferon signaling genes in recurrent major depression: increased expression detected by whole-blood RNA sequencing , 2013, Molecular Psychiatry.

[18]  Hunter B. Fraser,et al.  Pooled ChIP-Seq Links Variation in Transcription Factor Binding to Complex Disease Risk , 2016, Cell.

[19]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[20]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[21]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[22]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[23]  Xiao-hong Ma,et al.  Methionine sulfoxide reductase A (MsrA) associated with bipolar I disorder and executive functions in A Han Chinese population. , 2015, Journal of affective disorders.

[24]  L. Berthiaume,et al.  Wnt acylation: seeing is believing. , 2014, Nature chemical biology.

[25]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[26]  N. Wray,et al.  Genetic Differences in the Immediate Transcriptome Response to Stress Predict Risk-Related Brain Function and Psychiatric Disorders , 2015, Neuron.

[27]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[28]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[29]  Helge G. Roider,et al.  Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs , 2011, Nature Protocols.

[30]  Eric E Schadt,et al.  Disentangling molecular relationships with a causal inference test , 2009, BMC Genetics.

[31]  Brad E. Pfeiffer,et al.  Fragile X Mental Retardation Protein Is Required for Synapse Elimination by the Activity-Dependent Transcription Factor MEF2 , 2010, Neuron.

[32]  Adrian Dobra,et al.  A BAYESIAN GRAPHICAL MODEL FOR GENOME-WIDE ASSOCIATION STUDIES (GWAS). , 2016, The annals of applied statistics.

[33]  Roberto Vera Alvarez,et al.  Quantifying deleterious effects of regulatory variants , 2016, Nucleic acids research.

[34]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[35]  Stuart C. Sealfon,et al.  CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations , 2015, Bioinform..

[36]  H. Nicolini,et al.  Methionine sulfoxide reductase: A novel schizophrenia candidate gene , 2009, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[37]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[38]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[39]  Warren W. Kretzschmar,et al.  Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression , 2017, Nature Genetics.

[40]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[41]  Simon X. Chen,et al.  The Transcription Factor MEF2 Directs Developmental Visually Driven Functional and Structural Metaplasticity , 2012, Cell.

[42]  P. Muglia,et al.  Genome-wide association study of recurrent major depressive disorder in two European case–control cohorts , 2010, Molecular Psychiatry.

[43]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[44]  D. Greenwood,et al.  Height and pancreatic cancer risk: a systematic review and meta-analysis of cohort studies , 2012, Cancer Causes & Control.

[45]  John G Flannery,et al.  Massively parallel cis-regulatory analysis in the mammalian central nervous system , 2016, Genome research.

[46]  A. Feeney,et al.  YY1 plays an essential role at all stages of B-cell differentiation , 2016, Proceedings of the National Academy of Sciences.

[47]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[48]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[49]  B. Segal Stage-specific immune dysregulation in multiple sclerosis. , 2014, Journal of interferon & cytokine research : the official journal of the International Society for Interferon and Cytokine Research.

[50]  P. Farnham,et al.  Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome , 2015, Epigenetics & Chromatin.

[51]  Qing-Yu He,et al.  ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization , 2015, Bioinform..

[52]  S. Cichon,et al.  Genome-Wide Association-, Replication-, and Neuroimaging Study Implicates HOMER1 in the Etiology of Major Depression , 2010, Biological Psychiatry.

[53]  Bjarni V. Halldórsson,et al.  Many sequence variants affecting diversity of adult human height , 2008, Nature Genetics.

[54]  Paolo Bientinesi,et al.  High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software , 2014, F1000Research.

[55]  A. Hofman,et al.  The Neuronal Transporter Gene SLC6A15 Confers Risk to Major Depression , 2011, Neuron.

[56]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[57]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[58]  T. Mikkelsen,et al.  Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. , 2013, Genome research.

[59]  Christian Gieger,et al.  Tobacco Smoking Leads to Extensive Genome-Wide Changes in DNA Methylation , 2013, PloS one.

[60]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[61]  Shambhu Bhat,et al.  CACNA1C (Cav1.2) in the pathophysiology of psychiatric disease , 2012, Progress in Neurobiology.

[62]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[63]  Christian Gieger,et al.  Novel multiple sclerosis susceptibility loci implicated in epigenetic regulation , 2016, Science Advances.

[64]  K. Kendler,et al.  A Swedish national twin study of lifetime major depression. , 2006, The American journal of psychiatry.

[65]  H. Cordell,et al.  SNP Selection in Genome-Wide and Candidate Gene Studies via Penalized Logistic Regression , 2010, Genetic epidemiology.

[66]  Xinli Hu,et al.  SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci , 2014, Bioinform..

[67]  Ruth Pidsley,et al.  A data-driven approach to preprocessing Illumina 450K methylation array data , 2013, BMC Genomics.

[68]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[69]  Michael Krawczak,et al.  PopGen: Population-Based Recruitment of Patients and Controls for the Analysis of Complex Genotype-Phenotype Relationships , 2006, Public Health Genomics.

[70]  Devin C. Koestler,et al.  DNA methylation arrays as surrogate measures of cell mixture distribution , 2012, BMC Bioinformatics.

[71]  Simon G. Coetzee,et al.  motifbreakR: an R/Bioconductor package for predicting variant effects at transcription factor binding sites , 2015, Bioinform..

[72]  S. Cichon,et al.  Genome-wide association-, replication- and neuroimaging study implicates HOMER1 in the aetiology of major depression , 2010 .

[73]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[74]  J. Ioannidis,et al.  Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.

[75]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[76]  H. Ullum,et al.  The Multiple Sclerosis Genomic Map: Role of peripheral immune cells and resident microglia in susceptibility , 2017, bioRxiv.

[77]  Mi-Sung Kim,et al.  MEF2C, a transcription factor that facilitates learning and memory by negative regulation of synapse numbers and function , 2008, Proceedings of the National Academy of Sciences.

[78]  Manolis Kellis,et al.  Chromatin-state discovery and genome annotation with ChromHMM , 2017, Nature Protocols.

[79]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[80]  P. Sullivan,et al.  Genetic epidemiology of major depression: review and meta-analysis. , 2000, The American journal of psychiatry.

[81]  Christophe Ambroise,et al.  Performance of a blockwise approach in variable selection using linkage disequilibrium information , 2015, BMC Bioinformatics.

[82]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[83]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[84]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[85]  R. Duman,et al.  Integrating neuroimmune systems in the neurobiology of depression , 2016, Nature Reviews Neuroscience.

[86]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[87]  Robert W. Mills,et al.  Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures , 2016, eLife.

[88]  K. Hansen,et al.  Functional normalization of 450k methylation array data improves replication in large cancer studies , 2014, Genome Biology.

[89]  R. Weksberg,et al.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray , 2013, Epigenetics.

[90]  C. Gieger,et al.  Analyzing Illumina Gene Expression Microarray Data from Different Tissues: Methodological Aspects of Data Analysis in the MetaXpress Consortium , 2012, PloS one.

[91]  A. Maghazachi,et al.  Multiple sclerosis and the role of immune cells. , 2014, World journal of experimental medicine.

[92]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[93]  Benjamin A. Logsdon,et al.  Gene Expression Elucidates Functional Impact of Polygenic Risk for Schizophrenia , 2016, Nature Neuroscience.

[94]  J. Hampe,et al.  IL-6 blockade by monoclonal antibodies inhibits apolipoprotein (a) expression and lipoprotein (a) synthesis in humans[S] , 2015, Journal of Lipid Research.

[95]  Mathieu Lemire,et al.  Disease variants alter transcription factor levels and methylation of their binding sites , 2015, bioRxiv.

[96]  A. Dunning,et al.  Beyond GWASs: illuminating the dark road from association to function. , 2013, American journal of human genetics.