GWAS4D: multidimensional analysis of context-specific regulatory variant for human complex diseases and traits

Abstract Genome-wide association studies have generated over thousands of susceptibility loci for many human complex traits, and yet for most of these associations the true causal variants remain unknown. Tissue/cell type-specific prediction and prioritization of non-coding regulatory variants will facilitate the identification of causal variants and underlying pathogenic mechanisms for particular complex diseases and traits. By leveraging recent large-scale functional genomics/epigenomics data, we develop an intuitive web server, GWAS4D (http://mulinlab.tmu.edu.cn/gwas4d or http://mulinlab.org/gwas4d), that systematically evaluates GWAS signals and identifies context-specific regulatory variants. The updated web server includes six major features: (i) updates the regulatory variant prioritization method with our new algorithm; (ii) incorporates 127 tissue/cell type-specific epigenomes data; (iii) integrates motifs of 1480 transcriptional regulators from 13 public resources; (iv) uniformly processes Hi-C data and generates significant interactions at 5 kb resolution across 60 tissues/cell types; (v) adds comprehensive non-coding variant functional annotations; (vi) equips a highly interactive visualization function for SNP-target interaction. Using a GWAS fine-mapped set for 161 coronary artery disease risk loci, we demonstrate that GWAS4D is able to efficiently prioritize disease-causal regulatory variants.

[1]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[2]  Gerard Tromp,et al.  Meta-Analysis of Genome-Wide Association Studies for Abdominal Aortic Aneurysm Identifies Four New Disease-Specific Risk Loci , 2017, Circulation research.

[3]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[4]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[5]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[6]  Pim van der Harst,et al.  Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease , 2017, Circulation research.

[7]  Victor O. Leshyk,et al.  The 4D nucleome project , 2017, Nature.

[8]  Y. J. Kim,et al.  High-density genotyping of immune-related loci identifies new SLE risk variants in individuals with Asian ancestry , 2016, Nature Genetics.

[9]  Marcin Kozak,et al.  “A Dendrite Method for Cluster Analysis” by Caliński and Harabasz: A Classical Work that is Far Too Often Incorrectly Cited , 2012 .

[10]  Peter H. L. Krijger,et al.  Regulation of disease-associated gene expression in the 3D genome , 2016, Nature Reviews Molecular Cell Biology.

[11]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[12]  Eleazar Eskin,et al.  Improved methods for multi-trait fine mapping of pleiotropic risk loci , 2016, bioRxiv.

[13]  Yang Du,et al.  rSNPBase: a database for curated regulatory SNPs , 2013, Nucleic Acids Res..

[14]  Shihua Zhang,et al.  Large-scale determination and characterization of cell type-specific regulatory elements in the human genome , 2017, bioRxiv.

[15]  Cheng Quan,et al.  3DSNP: a database for linking human noncoding SNPs to their three-dimensional interacting genes , 2016, Nucleic Acids Res..

[16]  Feng Xu,et al.  Predicting regulatory variants with composite statistic , 2016, Bioinform..

[17]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[18]  Mark Daly,et al.  Principles and methods of in-silico prioritization of non-coding regulatory variants , 2017, Human Genetics.

[19]  Bin Yan,et al.  Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression , 2015, Briefings Bioinform..

[20]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[21]  Trieu Nguyen,et al.  Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci , 2016, Nature Communications.

[22]  Buhm Han,et al.  Disentangling effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex trait loci , 2014 .

[23]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[24]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.

[25]  Pak Chung Sham,et al.  GWASdb v2: an update database for human genetic variants identified by genome-wide association studies , 2015, Nucleic Acids Res..

[26]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[27]  Tao Liu,et al.  Cistrome Data Browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse , 2016, Nucleic Acids Res..

[28]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[29]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[30]  Manolis Kellis,et al.  HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease , 2015, Nucleic Acids Res..

[31]  Per Eriksson,et al.  Functional Analysis of a Novel Genome-Wide Association Study Signal in SMAD3 That Confers Protection From Coronary Artery Disease , 2016, Arteriosclerosis, thrombosis, and vascular biology.

[32]  A. Dunning,et al.  Beyond GWASs: illuminating the dark road from association to function. , 2013, American journal of human genetics.

[33]  Jie Huang,et al.  Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps , 2016, Nature Genetics.

[34]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[35]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[36]  Mulin Jun Li,et al.  Current trend of annotating single nucleotide variation in humans--A case study on SNVrap. , 2015, Methods.

[37]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[38]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[39]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[40]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[41]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[42]  Erdogan Taskesen,et al.  Functional mapping and annotation of genetic associations with FUMA , 2017, Nature Communications.

[43]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[44]  Eurie L. Hong,et al.  Annotation of functional variation in personal genomes using RegulomeDB , 2012, Genome research.

[45]  Pak Chung Sham,et al.  GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications , 2013, Nucleic Acids Res..

[46]  Buhm Han,et al.  Chromatin marks identify critical cell types for fine mapping complex trait variants , 2012 .

[47]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[48]  Mingming Jia,et al.  COSMIC: somatic cancer genetics at high-resolution , 2016, Nucleic Acids Res..

[49]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[50]  Ji Zhang,et al.  GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach , 2015, Bioinform..

[51]  Jordan A. Ramilowski,et al.  An atlas of human long non-coding RNAs with accurate 5′ ends , 2017, Nature.

[52]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[53]  Job Dekker,et al.  The 4 D nucleome project , 2017 .

[54]  Kai Wang,et al.  Enlight: web-based integration of GWAS results with biological annotations , 2015, Bioinform..

[55]  Pak C Sham,et al.  SNPTracker: A Swift Tool for Comprehensive Tracking and Unifying dbSNP rs IDs and Genomic Coordinates of Massive Sequence Variants , 2015, G3: Genes, Genomes, Genetics.

[56]  Vsevolod J. Makeev,et al.  Jaccard index based similarity measure to compare transcription factor binding site models , 2013, Algorithms for Molecular Biology.

[57]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[58]  Tim J. Carver,et al.  CHiCP: a web-based tool for the integrative and interactive visualization of promoter capture Hi-C datasets , 2016, Bioinform..

[59]  Zheng Xu,et al.  HUGIn: Hi-C Unifying Genomic Interrogator , 2017, bioRxiv.

[60]  Pak Chung Sham,et al.  cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes , 2017, Genome Biology.