Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data

BackgroundChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models.ResultsUsing ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets.ConclusionsThe experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs.

[1]  A. Visel,et al.  Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. , 2010, Genome research.

[2]  Victor G. Levitsky,et al.  From binding motifs in Chip-seq Data to Improved Models of transcription factor binding Sites , 2013, J. Bioinform. Comput. Biol..

[3]  Tatyana I. Merkulova,et al.  Structural variants of glucocorticoid receptor binding sites and different versions of positive glucocorticoid responsive elements: Analysis of GR-TRRD database , 2009, The Journal of Steroid Biochemistry and Molecular Biology.

[4]  Wyeth W. Wasserman,et al.  The Next Generation of Transcription Factor Binding Site Prediction , 2013, PLoS Comput. Biol..

[5]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[6]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[7]  B. Pugh,et al.  Evidence for Functional Binding and Stable Sliding of the TATA Binding Protein on Nonspecific DNA (*) , 1995, The Journal of Biological Chemistry.

[8]  E. A. Ananko,et al.  Artsite Database: Comparison of In Vitro Selected and Natural Binding Sites of Eukaryotic Transcription Factors , 2006 .

[9]  Yuchun Guo,et al.  High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints , 2012, PLoS Comput. Biol..

[10]  Victor G. Levitsky,et al.  Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions , 2007, BMC Bioinformatics.

[11]  N. Kato,et al.  Identification and characterization of glucocorticoid receptor-binding sites in the human genome , 2010, Journal of receptor and signal transduction research.

[12]  R. Pictet,et al.  Hepatocyte nuclear factor 3 determines the amplitude of the glucocorticoid response of the rat tyrosine aminotransferase gene. , 1995, DNA and cell biology.

[13]  Edward J. Oakeley,et al.  Position dependencies in transcription factor binding sites , 2007, Bioinform..

[14]  Emmanuel Barillot,et al.  De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis , 2010, Nucleic acids research.

[15]  J. van Helden,et al.  RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets , 2011, Nucleic acids research.

[16]  Jan Komorowski,et al.  Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing , 2009, Nucleic acids research.

[17]  R. Costa,et al.  Site-directed mutagenesis of hepatocyte nuclear factor (HNF) binding sites in the mouse transthyretin (TTR) promoter reveal synergistic interactions with its enhancer region. , 1991, Nucleic acids research.

[18]  Steven J. M. Jones,et al.  Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. , 2006, Genome research.

[19]  Felix Naef,et al.  Computational analysis of protein-DNA interactions from ChIP-seq data. , 2012, Methods in molecular biology.

[20]  G. Stormo,et al.  Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites , 2005, Nucleic acids research.

[21]  P. Farnham Insights from genomic profiling of transcription factors , 2009, Nature Reviews Genetics.

[22]  Raffaele Calogero,et al.  Genome-wide discovery of functional transcription factor binding sites by comparative genomics: The case of Stat3 , 2009, Proceedings of the National Academy of Sciences.

[23]  K. Kaestner,et al.  Glucocorticoid Receptor, C/EBP, HNF3, and Protein Kinase A Coordinately Activate the Glucocorticoid Response Unit of the Carbamoylphosphate Synthetase I Gene , 1998, Molecular and Cellular Biology.

[24]  Vladimir B. Bajic,et al.  Comparing the Success of Different Prediction Software in Sequence Analysis: A Review , 2000, Briefings Bioinform..

[25]  K. Kaestner,et al.  The Foxa family of transcription factors in development and metabolism , 2006, Cellular and Molecular Life Sciences CMLS.

[26]  Jens Keilwagen,et al.  A general approach for discriminative de novo motif discovery from high-throughput data , 2013, GCB.

[27]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[28]  R. Costa,et al.  The DNA-binding specificity of the hepatocyte nuclear factor 3/forkhead domain is influenced by amino-acid residues adjacent to the recognition helix , 1994, Molecular and cellular biology.

[29]  Hans Clevers,et al.  Efficient Double Fragmentation ChIP-seq Provides Nucleotide Resolution Protein-DNA Binding Profiles , 2010, PloS one.

[30]  Uwe Ohler,et al.  Optimized mixed Markov models for motif identification , 2006, BMC Bioinformatics.

[31]  G. Tuteja,et al.  Extracting transcription factor targets from ChIP-Seq data , 2009, Nucleic acids research.

[32]  Michael Q. Zhang,et al.  A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information , 2011, Nucleic acids research.

[33]  Thomas Zeng,et al.  Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing , 2008, Nucleic acids research.

[34]  Victor G. Levitsky,et al.  Combined experimental and computational approaches to study the regulatory elements in eukaryotic genes , 2007, Briefings Bioinform..

[35]  Philip Machanick,et al.  MEME-ChIP: motif analysis of large DNA datasets , 2011, Bioinform..

[36]  Vsevolod J. Makeev,et al.  Deep and wide digging for binding motifs in ChIP-Seq data , 2010, Bioinform..

[37]  J. Darnell,et al.  Hepatocyte nuclear factor 3 alpha belongs to a gene family in mammals that is homologous to the Drosophila homeotic gene fork head. , 1991, Genes & development.

[38]  Andrey N. Naumochkin,et al.  Transcription Regulatory Regions Database (TRRD): its status in 2002 , 2002, Nucleic Acids Res..

[39]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[40]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[41]  Klaus H. Kaestner,et al.  Novel computational analysis of protein binding array data identifies direct targets of Nkx2.2 in the pancreas , 2011, BMC Bioinformatics.

[42]  K. Kaestner,et al.  The FoxA factors in organogenesis and differentiation. , 2010, Current opinion in genetics & development.

[43]  T. Grange,et al.  Cell-type specific activity of two glucocorticoid responsive units of rat tyrosine aminotransferase gene is associated with multiple binding sites for C/EBP and a novel liver-specific nuclear factor , 1991, Nucleic Acids Res..

[44]  W. Lamers,et al.  Mechanisms of glucocorticoid signalling. , 2004, Biochimica et biophysica acta.

[45]  J. Chou,et al.  The role of HNF1alpha, HNF3gamma, and cyclic AMP in glucose-6-phosphatase gene activation. , 1997, Biochemistry.

[46]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[47]  Yongchao Liu,et al.  CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments , 2010, Bioinform..

[48]  Hyunsoo Kim,et al.  Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles , 2011, PloS one.

[49]  Alexander E. Kel,et al.  Transcription Regulatory Regions Database (TRRD): its status in 1999 , 1999, Nucleic Acids Res..

[50]  Vladimir B. Bajic,et al.  HOCOMOCO: a comprehensive collection of human transcription factor binding sites models , 2012, Nucleic Acids Res..

[51]  T. Maniatis,et al.  Common themes in the function of transcription and splicing enhancers. , 1997, Current opinion in cell biology.

[52]  L. Bryzgalov,et al.  Development of computational methods to search for FoxA transcription factor binding sites, their experimental verification and application to the analysis of ChIP-seq data , 2011, Doklady Biochemistry and Biophysics.

[53]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[54]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[55]  Atina G. Coté,et al.  Evaluation of methods for modeling transcription factor sequence specificity , 2013, Nature Biotechnology.