An information transmission model for transcription factor binding at regulatory DNA sites

BackgroundComputational identification of transcription factor binding sites (TFBSs) is a rapid, cost-efficient way to locate unknown regulatory elements. With increased potential for high-throughput genome sequencing, the availability of accurate computational methods for TFBS prediction has never been as important as it currently is. To date, identifying TFBSs with high sensitivity and specificity is still an open challenge, necessitating the development of novel models for predicting transcription factor-binding regulatory DNA elements.ResultsBased on the information theory, we propose a model for transcription factor binding of regulatory DNA sites. Our model incorporates position interdependencies in effective ways. The model computes the information transferred (TI) between the transcription factor and the TFBS during the binding process and uses TI as the criterion to determine whether the sequence motif is a possible TFBS. Based on this model, we developed a computational method to identify TFBSs. By theoretically proving and testing our model using both real and artificial data, we found that our model provides highly accurate predictive results.ConclusionsIn this study, we present a novel model for transcription factor binding regulatory DNA sites. The model can provide an increased ability to detect TFBSs.

[1]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[2]  Alberto Riva,et al.  MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes , 2005, BMC Bioinformatics.

[3]  H. Kono,et al.  Structure‐based prediction of DNA target sites by regulatory proteins , 1999, Proteins.

[4]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[5]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[6]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[7]  Bruce J. Aronow,et al.  CisMols Analyzer: identification of compositionally similar cis-element clusters in ortholog conserved regions of coordinately expressed genes , 2005, Nucleic Acids Res..

[8]  Richard H. Lathrop,et al.  DNA sequence and structure: direct and indirect recognition in protein-DNA binding , 2002, ISMB.

[9]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[10]  D. Baker,et al.  Protein–DNA binding specificity predictions with structural models , 2005, Nucleic acids research.

[11]  Karan P. Singh,et al.  Theoretical Biology and Medical Modelling , 2007 .

[12]  Aaron Golden,et al.  Transcription factor binding site identification using the self-organizing map , 2005, Bioinform..

[13]  B. Honig,et al.  Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry , 2007, Nucleic acids research.

[14]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[15]  Bart De Moor,et al.  Computational detection of cis-regulatory modules , 2003, ECCB.

[16]  Hélène Touzet,et al.  Predicting transcription factor binding sites using local over-representation and comparative genomics , 2006, BMC Bioinformatics.

[17]  F. P. Roth,et al.  A non-parametric model for transcription factor binding sites. , 2003, Nucleic acids research.

[18]  Satoru Miyano,et al.  Bayesian Joint Prediction of Associated Transcription Factors in Bacillus subtilis , 2005, Pacific Symposium on Biocomputing.

[19]  Qing Zhou,et al.  Modeling within-motif dependence for transcription factor binding site predictions , 2004, Bioinform..

[20]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[21]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[22]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[23]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[24]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[25]  Edward J. Oakeley,et al.  Position dependencies in transcription factor binding sites , 2007, Bioinform..

[26]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[27]  D. Guhathakurta,et al.  Computational identification of transcriptional regulatory elements in DNA sequence , 2006, Nucleic acids research.

[28]  Terence P. Speed,et al.  Finding short DNA motifs using permuted markov models , 2004, RECOMB.

[29]  Arend Sidow,et al.  Genomic regulatory regions: insights from comparative sequence analysis. , 2003, Current opinion in genetics & development.

[30]  G. Stormo,et al.  Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. , 2001, Nucleic acids research.

[31]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[32]  C. Pabo,et al.  Analysis of zinc fingers optimized via phage display: evaluating the utility of a recognition code. , 1999, Journal of molecular biology.

[33]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[34]  D. S. Fields,et al.  Specificity, free energy and information content in protein-DNA interactions. , 1998, Trends in biochemical sciences.

[35]  G. Stormo,et al.  Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. , 1992, Journal of molecular biology.

[36]  Anirvan M. Sengupta,et al.  A biophysical approach to transcription factor binding site discovery. , 2003, Genome research.

[37]  Tao Jiang,et al.  Identifying transcription factor binding sites through Markov chain optimization , 2002, ECCB.

[38]  Panayiotis V Benos,et al.  Probabilistic code for DNA recognition by proteins of the EGR family. , 2002, Journal of molecular biology.