Phyloscan: locating transcription-regulating binding sites in mixed aligned and unaligned sequence data

The transcription of a gene from its DNA template into an mRNA molecule is the first, and most heavily regulated, step in gene expression. Especially in bacteria, regulation is typically achieved via the binding of a transcription factor (protein) or small RNA molecule to the chromosomal region upstream of a regulated gene. The protein or RNA molecule recognizes a short, approximately conserved sequence within a gene's promoter region and, by binding to it, either enhances or represses expression of the nearby gene. Since the sought-for motif (pattern) is short and accommodating to variation, computational approaches that scan for binding sites have trouble distinguishing functional sites from look-alikes. Many computational approaches are unable to find the majority of experimentally verified binding sites without also finding many false positives. Phyloscan overcomes this difficulty by exploiting two key features of functional binding sites: (i) these sites are typically more conserved evolutionarily than are non-functional DNA sequences; and (ii) these sites often occur two or more times in the promoter region of a regulated gene. The website is free and open to all users, and there is no login requirement. Address: (http://bayesweb.wadsworth.org/phyloscan/).

[1]  Alan M. Moses,et al.  MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model , 2004, Genome Biology.

[2]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[4]  T. Bailey,et al.  High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites , 2008, Nucleic acids research.

[5]  Michael Gribskov,et al.  Methods and Statistics for Combining Motif Match Scores , 1998, J. Comput. Biol..

[6]  Edgar Wingender,et al.  PRODORIC: prokaryotic database of gene regulation , 2003, Nucleic Acids Res..

[7]  Dieter Jahn,et al.  Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes , 2005, Bioinform..

[8]  Gang Su,et al.  A web server for transcription factor binding site prediction , 2006, Bioinformation.

[9]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[10]  Jayashree Seshadri,et al.  PredictRegulon: a web server for the prediction of the regulatory protein binding sites and operons in prokaryote genomes , 2004, Nucleic Acids Res..

[11]  Lee Aaron Newberg,et al.  A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction , 2007, Bioinform..

[12]  Dan S. Prestridge,et al.  SIGNAL SCAN 4.0: additional databases and sequence formats , 1996, Comput. Appl. Biosci..

[13]  A. Halpern,et al.  Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. , 1998, Molecular biology and evolution.

[14]  Gary D. Stormo,et al.  MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices , 1995, Comput. Appl. Biosci..

[15]  E. Siggia,et al.  Analysis of Combinatorial cis-Regulation in Synthetic and Genomic Promoters , 2008, Nature.

[16]  David A. Nix,et al.  Large-Scale Turnover of Functional Transcription Factor Binding Sites in Drosophila , 2006, PLoS Comput. Biol..

[17]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[18]  Graziano Pesole,et al.  Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes , 2009, Nucleic Acids Res..

[19]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[20]  JAN T. KIM,et al.  Binding Matrix: a Novel Approach for Binding Site Recognition , 2004, J. Bioinform. Comput. Biol..

[21]  Lee Aaron Newberg,et al.  PhyloScan: identification of transcription factor binding sites using cross-species evidence , 2007, Algorithms for Molecular Biology.

[22]  Raphaël Marée,et al.  PREDetector: a new tool to identify regulatory elements in bacterial genomes. , 2007, Biochemical and biophysical research communications.

[23]  Alexander J. Hartemink,et al.  A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast , 2007, PLoS Comput. Biol..

[24]  Mona Singh,et al.  Comparative analysis of methods for representing and searching for transcription factor binding sites , 2004, Bioinform..

[25]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[26]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[27]  Ivan Ovcharenko,et al.  rVISTA 2.0: evolutionary analysis of transcription factor binding sites , 2004, Nucleic Acids Res..

[28]  A. F. Neuwald,et al.  Detecting patterns in protein sequences. , 1994, Journal of molecular biology.

[29]  Inna Dubchak,et al.  RegTransBase—a database of regulatory sequences and interactions in a wide range of prokaryotic genomes , 2006, Nucleic Acids Res..

[30]  D. Guhathakurta,et al.  Computational identification of transcriptional regulatory elements in DNA sequence , 2006, Nucleic acids research.

[31]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[32]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[33]  John Hawkins,et al.  Assessing phylogenetic motif models for predicting transcription factor binding sites , 2009, Bioinform..

[34]  David J. Arenillas,et al.  The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences , 2008, Nucleic Acids Res..