Measuring similarities between transcription factor binding sites

BackgroundCollections of transcription factor binding profiles (Transfac, Jaspar) are essential to identify regulatory elements in DNA sequences. Subsets of highly similar profiles complicate large scale analysis of transcription factor binding sites.ResultsWe propose to identify and group similar profiles using two independent similarity measures: χ2 distances between position frequency matrices (PFMs) and correlation coefficients between position weight matrices (PWMs) scores.ConclusionWe show that these measures complement each other and allow to associate Jaspar and Transfac matrices. Clusters of highly similar matrices are identified and can be used to optimise the search for regulatory elements. Moreover, the application of the measures is illustrated by assigning E-box matrices of a SELEX experiment and of experimentally characterised binding sites of circadian clock genes to the Myc-Max cluster.

[1]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[2]  J. Fickett Copyright � 1996, American Society for Microbiology Quantitative Discrimination of MEF2 Sites , 1995 .

[3]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[4]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[5]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[6]  Martin Vingron,et al.  Annotating regulatory DNA based on man-mouse genomic comparison , 2002, ECCB.

[7]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[8]  Michael Q. Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2001, Nature Genetics.

[9]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[10]  Ueli Schibler,et al.  Circadian rhythms. Liver regeneration clocks on. , 2003, Science.

[11]  D. P. King,et al.  Role of the CLOCK protein in the mammalian circadian mechanism. , 1998, Science.

[12]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[13]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[14]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[15]  J. Hogenesch,et al.  The basic-helix-loop-helix-PAS orphan MOP3 forms transcriptionally active complexes with circadian and hypoxia factors. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Paolo Sassone-Corsi,et al.  Timing the cell cycle , 2003, Nature Cell Biology.

[17]  Thomas E. Royce,et al.  Distribution of NF-κB-binding sites across human chromosome 22 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  F. P. Roth,et al.  A non-parametric model for transcription factor binding sites. , 2003, Nucleic acids research.

[19]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[20]  Mark Gerstein,et al.  Distribution of NF-kappaB-binding sites across human chromosome 22. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  松尾 拓哉 Control mechanism of the circadian clock for timing of cell division in vivo , 2004 .

[22]  T. Werner,et al.  Computer modeling of promoter organization as a tool to study transcriptional coregulation , 2003, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[23]  Francis Lévi,et al.  Host circadian clock as a control point in tumor progression. , 2002, Journal of the National Cancer Institute.

[24]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[25]  H. Herzel,et al.  Prediction of cis-regulatory elements of coregulated genes. , 2004, Genome informatics. International Conference on Genome Informatics.

[26]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[27]  Peter M. Haverty,et al.  CARRIE web service: automated transcriptional regulatory network inference and interactive analysis , 2004, Nucleic Acids Res..

[28]  Ueli Schibler,et al.  Liver Regeneration Clocks On , 2003, Science.

[29]  Qing Zhou,et al.  Modeling within-motif dependence for transcription factor binding site predictions , 2004, Bioinform..

[30]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[31]  S. Levy,et al.  Predicting transcription factor synergism. , 2002, Nucleic acids research.

[32]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[33]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[34]  Alexander Greer,et al.  Chemistry. Enhanced: a view of unusual peroxides. , 2003, Science.

[35]  Thomas K. Darlington,et al.  Closing the circadian loop: CLOCK-induced transcription of its own inhibitors per and tim. , 1998, Science.

[36]  Michael Q. Zhang,et al.  Similarity of position frequency matrices for transcription factor binding sites , 2005, Bioinform..

[37]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[38]  S Harbeck,et al.  Stochastic segment models of eukaryotic promoter regions. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[39]  David Whitmore,et al.  E-box function in a period gene repressed by light , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[40]  A. Sandelin,et al.  Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. , 2004, Journal of molecular biology.

[41]  Martin C. Frith,et al.  Cluster-Buster: finding dense clusters of motifs in DNA sequences , 2003, Nucleic Acids Res..

[42]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[43]  R. Baler,et al.  Circadian Transcription , 2002, The Journal of Biological Chemistry.

[44]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[45]  R. Baler,et al.  The rat arylalkylamine N-acetyltransferase E-box: differential use in a master vs. a slave oscillator. , 2000, Brain research. Molecular brain research.