Hybrid Gibbs-sampling algorithm for challenging motif discovery: GibbsDST.

The difficulties of computational discovery of transcription factor binding sites (TFBS) are well represented by (l, d) planted motif challenge problems. Large d problems are difficult, particularly for profile-based motif discovery algorithms. Their local search in the profile space is apparently incompatible with subtle motifs and large mutational distances between the motif occurrences. Herein, an improved profile-based method called GibbsDST is described and tested on (15,4), (12,3), and (18,6) challenging problems. For the first time for a profile-based method, its performance in motif challenge problems is comparable to that of Random Projection. It is noteworthy that GibbsDST outperforms a pattern-based algorithm, WINNOWER, in some cases. Effectiveness of GibbsDST using a biological dataset as an example and its possible extension to more realistic evolution models are also introduced.

[1]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[2]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[4]  Mathieu Blanchette,et al.  PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences , 2004, BMC Bioinformatics.

[5]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[6]  M Ishikawa,et al.  Multiple sequence alignment by parallel simulated annealing , 1993, Comput. Appl. Biosci..

[7]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[8]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[9]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[10]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[11]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[12]  Serafim Batzoglou,et al.  MotifCut: regulatory motifs finding with maximum density subgraphs , 2006, ISMB.

[13]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[14]  Qing Zhou,et al.  Modeling within-motif dependence for transcription factor binding site predictions , 2004, Bioinform..

[15]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[16]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..