Tmod: toolbox of motif discovery

SUMMARY Motif discovery is an important topic in computational transcriptional regulation studies. In the past decade, many researchers have contributed to the field and many de novo motif-finding tools have been developed, each may have a different strength. However, most of these tools do not have a user-friendly interface and their results are not easily comparable. We present a software called Toolbox of Motif Discovery (Tmod) for Windows operating systems. The current version of Tmod integrates 12 widely used motif discovery programs: MDscan, BioProspector, AlignACE, Gibbs Motif Sampler, MEME, CONSENSUS, MotifRegressor, GLAM, MotifSampler, SeSiMCMC, Weeder and YMF. Tmod provides a unified interface to ease the use of these programs and help users to understand the tuning parameters. It allows plug-in motif-finding programs to run either separately or in a batch mode with predetermined parameters, and provides a summary comprising of outputs from multiple programs. Tmod is developed in C++ with the support of Microsoft Foundation Classes and Cygwin. Tmod can also be easily expanded to include future algorithms. AVAILABILITY Tmod is available for download at http://www.fas.harvard.edu/~junliu/Tmod/.

[1]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[2]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[3]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[5]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[6]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[7]  Graziano Pesole,et al.  In silico representation and discovery of transcription factor binding sites , 2004, Briefings Bioinform..

[8]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[9]  Shane T. Jensen,et al.  BioOptimizer: a Bayesian scoring function approach to motif discovery , 2004, Bioinform..

[10]  Liming Cai,et al.  BEST: Binding-site Estimation Suite of Tools , 2005, Bioinform..

[11]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Eric C. Rouchka,et al.  Gibbs Recursive Sampler: finding transcription factor binding sites , 2003, Nucleic Acids Res..

[13]  Lee Aaron Newberg,et al.  A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction , 2007, Bioinform..

[14]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[15]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[16]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[17]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[18]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[19]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[20]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[21]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[22]  Jun S. Liu,et al.  Decoding human regulatory circuits. , 2004, Genome research.

[23]  Lee Aaron Newberg,et al.  The Gibbs Centroid Sampler , 2007, Nucleic Acids Res..

[24]  Mikhail S. Gelfand,et al.  A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length , 2005, Bioinform..

[25]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[26]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[27]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[28]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.