Detection of Regulatory SNPs in Human Genome Using ChIP-seq ENCODE Data

A vast amount of SNPs derived from genome-wide association studies are represented by non-coding ones, therefore exacerbating the need for effective identification of regulatory SNPs (rSNPs) among them. However, this task remains challenging since the regulatory part of the human genome is annotated much poorly as opposed to coding regions. Here we describe an approach aggregating the whole set of ENCODE ChIP-seq data in order to search for rSNPs, and provide the experimental evidence of its efficiency. Its algorithm is based on the assumption that the enrichment of a genomic region with transcription factor binding loci (ChIP-seq peaks) indicates its regulatory function, and thereby SNPs located in this region are more likely to influence transcription regulation. To ensure that the approach preferably selects functionally meaningful SNPs, we performed enrichment analysis of several human SNP datasets associated with phenotypic manifestations. It was shown that all samples are significantly enriched with SNPs falling into the regions of multiple ChIP-seq peaks as compared with the randomly selected SNPs. For experimental verification, 40 SNPs falling into overlapping regions of at least 7 TF binding loci were selected from OMIM. The effect of SNPs on the binding of the DNA fragments containing them to the nuclear proteins from four human cell lines (HepG2, HeLaS3, HCT-116, and K562) has been tested by EMSA. A radical change in the binding pattern has been observed for 29 SNPs, besides, 6 more SNPs also demonstrated less pronounced changes. Taken together, the results demonstrate the effective way to search for potential rSNPs with the aid of ChIP-seq data provided by ENCODE project.

[1]  Y. Fei,et al.  Beta-thalassemia due to a T----A mutation within the ATA box. , 1988, Biochemical and biophysical research communications.

[2]  D. Comings,et al.  Exon and intron variants in the human tryptophan 2,3-dioxygenase gene: potential association with Tourette syndrome, substance abuse and other disorders. , 1996, Pharmacogenetics.

[3]  T. Merkulova,et al.  Point mutations within 663–666 bp of intron 6 of the human TDO2 gene, associated with a number of psychiatric disorders, damage the YY‐1 transcription factor binding site , 1999, FEBS letters.

[4]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[5]  Julia V Ponomarenko,et al.  rSNP_Guide: An integrated database‐tools system for studying SNPs and site‐directed mutations in transcription factor binding sites , 2002, Human mutation.

[6]  Marc S Halfon,et al.  Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. , 2002, Genome research.

[7]  G. Stormo,et al.  Identification of a novel cis-regulatory element involved in the heat shock response in Caenorhabditis elegans using microarray gene expression and computational methods. , 2002, Genome research.

[8]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[9]  M. O’Donovan,et al.  Functional analysis of human promoter polymorphisms. , 2003, Human molecular genetics.

[10]  B. Ellsworth,et al.  The gonadotropin releasing hormone (GnRH) receptor activating sequence (GRAS) is a composite regulatory element that interacts with multiple classes of transcription factors including Smads, AP-1 and a forkhead DNA binding protein , 2003, Molecular and Cellular Endocrinology.

[11]  E. Milgrom,et al.  Hereditary persistence of alpha-fetoprotein is due to both proximal and distal hepatocyte nuclear factor-1 site mutations. , 2004, Gastroenterology.

[12]  E. Milgrom,et al.  Hereditary persistence of α-fetoprotein is due to both proximal and distal hepatocyte nuclear factor-1 site mutations1 , 2004 .

[13]  Michael C O'Donovan,et al.  Functional analysis of polymorphisms in the promoter regions of genes on 22q11 , 2004, Human mutation.

[14]  W. Wasserman,et al.  Identification of functional SNPs in the 5-prime flanking sequences of human genes , 2005, BMC Genomics.

[15]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[16]  A. B. Perkins,et al.  High-density single-nucleotide polymorphism maps of the human genome. , 2005, Genomics.

[17]  Steven J. M. Jones,et al.  Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. , 2006, Genome research.

[18]  Joost Schymkowitz,et al.  Bioinformatics Applications Note Snpeffect V2.0: a New Step in Investigating the Molecular Phenotypic Effects of Human Non-synonymous Snps , 2022 .

[19]  Dawood B. Dudekula,et al.  CisView: a browser and database of cis-regulatory modules predicted in the mouse genome. , 2006, DNA research : an international journal for rapid publication of reports on genes and genomes.

[20]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[21]  Vip Viprakasit,et al.  A Regulatory SNP Causes a Human Genetic Disease by Creating a New Transcriptional Promoter , 2006, Science.

[22]  D. Menendez,et al.  A Single-Nucleotide Polymorphism in a Half-Binding Site Creates p53 and Estrogen Receptor Control of Vascular Endothelial Growth Factor Receptor 1 , 2007, Molecular and Cellular Biology.

[23]  Steven J. M. Jones,et al.  A Survey of Genomic Properties for the Detection of Regulatory Polymorphisms , 2007, PLoS Comput. Biol..

[24]  Victor G. Levitsky,et al.  Combined experimental and computational approaches to study the regulatory elements in eukaryotic genes , 2007, Briefings Bioinform..

[25]  G. Wray The evolutionary significance of cis-regulatory mutations , 2007, Nature Reviews Genetics.

[26]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[27]  David C Fargo,et al.  Using ChIP-chip and ChIP-seq to study the regulation of gene expression: genome-wide localization studies reveal widespread regulation of transcription elongation. , 2009, Methods.

[28]  E. Furlong,et al.  Combinatorial binding predicts spatio-temporal cis-regulatory activity , 2009, Nature.

[29]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[30]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[31]  J. Komorowski,et al.  Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP , 2009, Nucleic acids research.

[32]  P. Farnham Insights from genomic profiling of transcription factors , 2009, Nature Reviews Genetics.

[33]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[34]  Olle Melander,et al.  From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus , 2010, Nature.

[35]  W. Ouwehand,et al.  Combinatorial transcriptional control in blood stem/progenitor cells: genome-wide analysis of ten major transcriptional regulators. , 2010, Cell stem cell.

[36]  M. Nóbrega,et al.  Genome‐wide maps of transcription regulatory elements , 2010, Wiley interdisciplinary reviews. Systems biology and medicine.

[37]  P. Stenson,et al.  Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics , 2010, Human mutation.

[38]  T. Ushijima,et al.  A 5'-region polymorphism modulates promoter activity of the tumor suppressor gene MFSD2A , 2011, Molecular Cancer.

[39]  T. Pabst,et al.  PU.1 is regulated by NF-κB through a novel binding site in a 17 kb upstream enhancer element , 2010, Oncogene.

[40]  B. Komm,et al.  Genome-Wide Analysis of Estrogen Receptor α DNA Binding and Tethering Mechanisms Identifies Runx1 as a Novel Tethering Factor in Receptor-Mediated Transcriptional Activation , 2010, Molecular and Cellular Biology.

[41]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[42]  Nathaniel D. Heintzman,et al.  9p21 DNA variants associated with Coronary Artery Disease impair IFNγ signaling response , 2011, Nature.

[43]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[44]  P. Radivojac,et al.  Prediction of functional regulatory SNPs in monogenic and complex disease , 2011, Human mutation.

[45]  N. Kolchanov,et al.  In vitro examining the existing prognoses how TBP binds to TATA with SNP associated with human diseases , 2011 .

[46]  Berthold Göttgens,et al.  Maps of Open Chromatin Guide the Functional Follow-Up of Genome-Wide Association Signals: Application to Hematological Traits , 2011, PLoS genetics.

[47]  Ananda L Roy,et al.  Enhancer-promoter communication and transcriptional regulation of Igh. , 2011, Trends in immunology.

[48]  Jirong Li,et al.  The role of upstream stimulatory factor 1 in the transcriptional regulation of the human TBX21 promoter mediated by the T-1514C polymorphism associated with systemic lupus erythematosus , 2012, Immunogenetics.

[49]  R. Renkawitz,et al.  CTCF function is modulated by neighboring DNA binding factors. , 2011, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[50]  J. Clements,et al.  Regulation of CCL2 Expression by an Upstream TALE Homeodomain Protein-Binding Site That Synergizes with the Site Created by the A-2578G SNP , 2011, PloS one.

[51]  M. Halfon,et al.  Erroneous attribution of relevant transcription factor binding sites despite successful prediction of cis-regulatory modules , 2011, BMC Genomics.

[52]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[53]  Douglas A. Bell,et al.  Human single-nucleotide polymorphisms alter p53 sequence-specific binding at gene regulatory elements , 2010, Nucleic Acids Res..

[54]  N. Datson,et al.  A genome-wide signature of glucocorticoid receptor binding in neuronal PC12 cells , 2012, BMC Neuroscience.

[55]  Pak Chung Sham,et al.  Genetic variant representation, annotation and prioritization in the post-GWAS era , 2012, Cell Research.

[56]  Enrique Blanco,et al.  ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites , 2012, Bioinform..

[57]  C. Bezzina,et al.  Genetic variation in T-box binding element functionally affects SCN5A/SCN10A enhancer. , 2012, The Journal of clinical investigation.

[58]  Michael R. Green,et al.  Characterization of enhancer function from genome-wide analyses. , 2012, Annual review of genomics and human genetics (Print).

[59]  Kenta Nakai,et al.  DBTSS: DataBase of Transcriptional Start Sites progress report in 2012 , 2011, Nucleic Acids Res..

[60]  Yujun Xing,et al.  Subset of genes targeted by transcription factor NF-κB in TNFα-stimulated human HeLa cells , 2012, Functional & Integrative Genomics.

[61]  S. Batzoglou,et al.  Linking disease associations with regulatory information in the human genome , 2012, Genome research.

[62]  Pak Chung Sham,et al.  GWASdb: a database for human genetic variants identified by genome-wide association studies , 2011, Nucleic Acids Res..

[63]  Vishwanath R Iyer,et al.  Simultaneous SNP identification and assessment of allele-specific bias from ChIP-seq data , 2012, BMC Genetics.