HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease

More than 90% of common variants associated with complex traits do not affect proteins directly, but instead the circuits that control gene expression. This has increased the urgency of understanding the regulatory genome as a key component for translating genetic results into mechanistic insights and ultimately therapeutics. To address this challenge, we developed HaploReg (http://compbio.mit.edu/HaploReg) to aid the functional dissection of genome-wide association study (GWAS) results, the prediction of putative causal variants in haplotype blocks, the prediction of likely cell types of action, and the prediction of candidate target genes by systematic mining of comparative, epigenomic and regulatory annotations. Since first launching the website in 2011, we have greatly expanded HaploReg, increasing the number of chromatin state maps to 127 reference epigenomes from ENCODE 2012 and Roadmap Epigenomics, incorporating regulator binding data, expanding regulatory motif disruption annotations, and integrating expression quantitative trait locus (eQTL) variants and their tissue-specific target genes from GTEx, Geuvadis, and other recent studies. We present these updates as HaploReg v4, and illustrate a use case of HaploReg for attention deficit hyperactivity disorder (ADHD)-associated SNPs with putative brain regulatory mechanisms.

[1]  Jingyuan Fu,et al.  Trans-eQTLs Reveal That Independent Genetic Variants Associated with a Complex Phenotype Converge on Intermediate Genes, with a Major Role for the HLA , 2011, PLoS genetics.

[2]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[3]  M. Peters,et al.  Systematic identification of trans eQTLs as putative drivers of known disease associations , 2013, Nature Genetics.

[4]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[5]  John D. Storey,et al.  Mapping the Genetic Architecture of Gene Expression in Human Liver , 2008, PLoS biology.

[6]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[7]  Susanne Walitza,et al.  Molecular genetics of adult ADHD: converging evidence from genome-wide association and extended pedigree linkage studies , 2008, Journal of Neural Transmission.

[8]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease , 2014, Nucleic Acids Res..

[9]  Richard Leslie,et al.  GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database , 2014, Bioinform..

[10]  Judith A. Blake,et al.  The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse , 2013, Nucleic Acids Res..

[11]  Margaret A. Pericak-Vance,et al.  Brain Expression Genome-Wide Association Study (eGWAS) Identifies Human Disease-Associated Variants , 2012, PLoS genetics.

[12]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[13]  Manolis Kellis,et al.  Large-scale epigenome imputation improves data quality and disease variant enrichment , 2015, Nature Biotechnology.

[14]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[15]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[16]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[17]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[18]  Manolis Kellis,et al.  HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants , 2011, Nucleic Acids Res..

[19]  Gabriëlle H S Buitendijk,et al.  Genome-wide meta-analyses of multiancestry cohorts identify multiple new susceptibility loci for refractive error and myopia , 2013, Nature Genetics.

[20]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[21]  Manolis Kellis,et al.  Common Genetic Variants Modulate Pathogen-Sensing Responses in Human Dendritic Cells , 2014, Science.

[22]  Nicole Soranzo,et al.  Functional interpretation of non-coding sequence variation: Concepts and challenges , 2013, BioEssays : news and reviews in molecular, cellular and developmental biology.

[23]  Xiaofeng Zhu,et al.  Genome-wide association analysis of blood-pressure traits in African-ancestry individuals reveals common associated genes in African and non-African populations. , 2013, American journal of human genetics.

[24]  Manolis Kellis,et al.  Interpreting noncoding genetic variation in complex traits and human disease , 2012, Nature Biotechnology.

[25]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[26]  C. Gieger,et al.  Human metabolic individuality in biomedical and pharmaceutical research , 2011, Nature.

[27]  Wenjie Chen,et al.  GRASP v2.0: an update on the Genome-Wide Repository of Associations between SNPs and phenotypes , 2014, Nucleic Acids Res..

[28]  R. Andrews,et al.  Innate Immune Activity Conditions the Effect of Regulatory Variants upon Monocyte Gene Expression , 2014, Science.

[29]  David C. Nickle,et al.  Lung eQTLs to Help Reveal the Molecular Underpinnings of Asthma , 2012, PLoS genetics.

[30]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[31]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[32]  A. Lusis,et al.  Systems genetics approaches to understand complex traits , 2013, Nature Reviews Genetics.

[33]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[34]  Michael P Milham,et al.  Toward systems neuroscience of ADHD: a meta-analysis of 55 fMRI studies. , 2012, The American journal of psychiatry.

[35]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[36]  B. Stranger,et al.  Expression QTL-based analyses reveal candidate causal genes and loci across five tumor types. , 2014, Human molecular genetics.

[37]  Nazneen Rahman,et al.  Identification of nine new susceptibility loci for testicular cancer, including variants near DAZL and PRDM14 , 2013, Nature Genetics.

[38]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[39]  Nazneen Rahman,et al.  Meta-analysis identifies four new loci associated with testicular germ cell tumor , 2013, Nature Genetics.

[40]  A. Singleton,et al.  Genetic variability in the regulation of gene expression in ten regions of the human brain , 2014, Nature Neuroscience.

[41]  Scott T. Weiss,et al.  Global Analysis of the Impact of Environmental Perturbation on cis-Regulation of Gene Expression , 2011, PLoS genetics.

[42]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[43]  Manolis Kellis,et al.  Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments , 2013, Nucleic acids research.

[44]  C. D. dos Remedios,et al.  Genome-Wide Identification of Expression Quantitative Trait Loci (eQTLs) in Human Heart , 2014, PLoS ONE.