MoRAine - A web server for fast computational transcription factor binding motif re-annotation

BACKGROUND A precise experimental identification of transcription factor binding motifs (TFBMs), accurate to a single base pair, is time-consuming and diffcult. For several databases, TFBM annotations are extracted from the literature and stored 5' --> 3' relative to the target gene. Mixing the two possible orientations of a motif results in poor information content of subsequently computed position frequency matrices (PFMs) and sequence logos. Since these PFMs are used to predict further TFBMs, we address the question if the TFBMs underlying a PFM can be re-annotated automatically to improve both the information content of the PFM and subsequent classification performance. RESULTS We present MoRAine, an algorithm that re-annotates transcription factor binding motifs. Each motif with experimental evidence underlying a PFM is compared against each other such motif. The goal is to re-annotate TFBMs by possibly switching their strands and shifting them a few positions in order to maximize the information content of the resulting adjusted PFM. We present two heuristic strategies to perform this optimization and subsequently show that MoRAine significantly improves the corresponding sequence logos. Furthermore, we justify the method by evaluating specificity, sensitivity, true positive, and false positive rates of PFM-based TFBM predictions for E. coli using the original database motifs and the MoRAine-adjusted motifs. The classification performance is considerably increased if MoRAine is used as a preprocessing step. CONCLUSIONS MoRAine is integrated into a publicly available web server and can be used online or downloaded as a stand-alone version from http://moraine.cebitec. uni-bielefeld.de.

[1]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[2]  Dieter Jahn,et al.  Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes , 2005, Bioinform..

[3]  Martin Vingron,et al.  On the Power of Profiles for Transcription Factor Binding Site Detection , 2003, Statistical applications in genetics and molecular biology.

[4]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[5]  Michael Beckstette,et al.  Fast index based algorithms and software for matching position specific scoring matrices , 2006, BMC Bioinformatics.

[6]  R. Sauer,et al.  Transcription factors: structural families and principles of DNA recognition. , 1992, Annual review of biochemistry.

[7]  S. Teichmann,et al.  Evolutionary dynamics of prokaryotic transcriptional regulatory networks. , 2006, Journal of molecular biology.

[8]  D. S. Chekmenev,et al.  P-Match: transcription factor binding site search by combining patterns and weight matrices , 2005, Nucleic Acids Res..

[9]  Andreas Tauch,et al.  CoryneRegNet 3.0--an interactive systems biology platform for the analysis of gene regulatory networks in corynebacteria and Escherichia coli. , 2007, Journal of biotechnology.

[10]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[11]  Edgar Wingender,et al.  TRANSFAC, TRANSPATH and CYTOMER as starting points for an ontology of regulatory networks. , 2004, In silico biology.

[12]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[13]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[14]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[15]  Edgar Wingender,et al.  PRODORIC: prokaryotic database of gene regulation , 2003, Nucleic Acids Res..

[16]  T. Wittkop,et al.  CoryneRegNet 2: An Integrative Bioinformatics Approach for Reconstruction and Comparison of Transcriptional Regulatory Networks in Prokaryotes , 2006 .

[17]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[18]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[19]  S. Teichmann,et al.  Evolution of transcription factors and the gene regulatory network in Escherichia coli. , 2003, Nucleic acids research.

[20]  Jan Baumbach,et al.  CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks , 2007, BMC Bioinformatics.

[21]  Hongyu Zhao,et al.  Protein–DNA interaction mapping using genomic tiling path microarrays in Drosophila , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  S. Teichmann,et al.  Gene regulatory network growth by duplication , 2004, Nature Genetics.

[23]  J. Collado-Vides,et al.  Modular analysis of the transcriptional regulatory network of E. coli. , 2005, Trends in genetics : TIG.

[24]  M. Gerstein,et al.  Structure and evolution of transcriptional regulatory networks. , 2004, Current opinion in structural biology.

[25]  J. Baumbach,et al.  CoryneRegNet: An ontology-based data warehouse of corynebacterial transcription factors and regulatory networks , 2006, BMC Genomics.

[26]  L. Hellman,et al.  Electrophoretic mobility shift assay (EMSA) for detecting protein–nucleic acid interactions , 2007, Nature Protocols.