Functional classification of transcription factor binding sites: information content as a metric

The information content (relative entropy) of transcription factor binding sites (TFBS) is used to classify the transcription factors (TFs). The TF classes are clustered based on the TFBS clustering using information content. Any TF belonging to the TF class cluster has a chance of binding to any TFBS of the clustered group. Thus, out of the 41 TFBS (in humans), perhaps only 5 -10 TFs may be actually needed and in case of mouse instead of 13 TFs, we may have actually 5 or so TFs. The JASPAR database of TFBS are used in this study. The experimental data on TFs of specific gene expression from TRRD database is also coinciding with our computational results. This gives us a new way to look at the protein classificationnot based on their structure or function but by the nature of their TFBS.

[1]  H B Nicholas,et al.  Strategies for searching sequence databases. , 2000, BioTechniques.

[2]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ashok Reddy Dinasarapu,et al.  Comparative analysis of core promoter region: Information content from mono and dinucleotide substitution matrices , 2006, Comput. Biol. Chem..

[4]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[5]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[6]  A. Panchenko,et al.  A comparison of position‐specific score matrices based on sequence and structure alignments , 2002, Protein science : a publication of the Protein Society.

[7]  Rodger Staden,et al.  Methods to define and locate patterns of motifs in sequences , 1988, Comput. Appl. Biosci..

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[10]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[11]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[12]  S. Levy,et al.  Predicting transcription factor synergism. , 2002, Nucleic acids research.

[13]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Stephen F. Altschul,et al.  The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions , 2005, Bioinform..

[15]  Peter F. Arndt,et al.  Identification and Measurement of Neigbor Dependent Nucleotide Substitution Processes , 2005, German Conference on Bioinformatics.

[16]  S. Altschul A protein alignment scoring system sensitive at all evolutionary distances , 1993, Journal of Molecular Evolution.

[17]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[18]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[19]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[20]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Vázquez,et al.  From Transcription Factors to Designed Sequence‐Specific DNA‐Binding Peptides , 2004 .

[22]  Andrey N. Naumochkin,et al.  Transcription Regulatory Regions Database (TRRD): its status in 2002 , 2002, Nucleic Acids Res..

[23]  Jotun Hein,et al.  A nucleotide substitution model with nearest-neighbour interactions , 2004, ISMB/ECCB.

[24]  Alexander J. Hartemink,et al.  Sequence features of DNA binding sites reveal structural class of associated transcription factor , 2006, Bioinform..

[25]  Maria Stepanova,et al.  A comparative analysis of relative occurrence of transcription factor binding sites in vertebrate genomes and gene promoter areas , 2005, Bioinform..

[26]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[27]  G. Stormo,et al.  Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites , 2005, Nucleic acids research.

[28]  S. Altschul,et al.  The compositional adjustment of amino acid substitution matrices , 2003, Proceedings of the National Academy of Sciences of the United States of America.