Predicting transcription factor binding sites using local over-representation and comparative genomics

BackgroundIdentifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs) in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms.ResultsWe have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets.ConclusionTFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at http://bioinfo.lifl.fr/TFM-Explorer.

[1]  J. Hiscott,et al.  Characterization of a functional NF-kappa B site in the human interleukin 1 beta promoter: evidence for a positive autoregulatory loop , 1993, Molecular and cellular biology.

[2]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[3]  Jean-Michel Claverie,et al.  The statistical significance of nucleotide position-weight matrix matches , 1996, Comput. Appl. Biosci..

[4]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[5]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[6]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[7]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[8]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[9]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[10]  T. Tsunoda,et al.  Identification and characterization of the potential promoter regions of 1031 kinds of human genes. , 2001, Genome research.

[11]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[12]  D. Bernard,et al.  The c-Rel transcription factor can both induce and inhibit apoptosis in the same cells via the upregulation of MnSOD , 2002, Oncogene.

[13]  Alexander E. Kel,et al.  Composition-sensitive analysis of the human genome for regulatory signals , 2003, Silico Biol..

[14]  B. De Moor,et al.  Toucan: deciphering the cis-regulatory logic of coregulated genes. , 2003, Nucleic acids research.

[15]  Jiashun Zheng,et al.  An approach to identify over-represented cis-elements in related sequences. , 2003, Nucleic acids research.

[16]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[17]  Wyeth W. Wasserman,et al.  Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm , 2003, ISMB.

[18]  Wing Hung Wong,et al.  Determination of Local Statistical Significance of Patterns in Markov Sequences with Application to Promoter Element Identification , 2004, J. Comput. Biol..

[19]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[20]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[21]  Wyeth W. Wasserman,et al.  MSCAN: identification of functional clusters of transcription factor binding sites , 2004, Nucleic Acids Res..

[22]  Vladimir B. Bajic,et al.  Promoter modeling: the case study of mammalian histone promoters , 2005, Bioinform..

[23]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[24]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[25]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[26]  David J. Arenillas,et al.  oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes , 2005, Nucleic acids research.

[27]  A. Nordheim,et al.  Requirement for serum response factor for skeletal muscle growth and maturation revealed by tissue-specific gene deletion in mice. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Jean-Stéphane Varré,et al.  Large Scale Matching for Position Weight Matrices , 2006, CPM.

[29]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..