Finding motifs in the twilight zone

We introduce the notion of a multiprofile and use it for finding subtle motifs in DNA sequences. Multiprofiles generalize the notion of a profile and allow one to detect subtle consensus sequences that escape detection by the standard profiles. Our MULTIPROFILER algorithm outperforms other leading motif finding algorithms in a number of synthetic models. Moreover, it can be shown that in some previously studied motif models, MULTIPROFILER is capable of pushing the performance envelope to its theoretical limits.

[1]  Martin Tompa,et al.  An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem , 1999, ISMB.

[2]  Mikhail S. Gelfand,et al.  Finding Weak Motifs in DNA Sequences , 2001, Pacific Symposium on Biocomputing.

[3]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[4]  G. Pesole,et al.  WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences. , 1992, Nucleic acids research.

[5]  Hanah Margalit,et al.  Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon , 1995, Comput. Appl. Biosci..

[6]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[7]  Rodger Staden,et al.  Methods for discovering novel motifs in nucleic acid sequences , 1989, Comput. Appl. Biosci..

[8]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[9]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[10]  Jeremy Buhler,et al.  Search algorithms for biosequences using random projection , 2001 .

[11]  Alain Viari,et al.  Searching for Repeated Words in a Text Allowing for Mismatches and Gaps , 1995 .

[12]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[13]  M. Waterman,et al.  Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. , 1985, Journal of molecular biology.

[14]  Marie-France Sagot,et al.  Spelling Approximate Repeated or Common Motifs Using a Suffix Tree , 1998, LATIN.

[15]  Uri Keich,et al.  U Subtle motifs: defining the limits of motif finding algorithms , 2002, Bioinform..

[16]  Mathieu Blanchette,et al.  Separating real motifs from their artifacts , 2001, ISMB.

[17]  Mathieu Blanchette,et al.  Algorithms for phylogenetic footprinting , 2001, RECOMB.

[18]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[19]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[21]  M. Waterman,et al.  Pattern recognition in several sequences: consensus and alignment. , 1984, Bulletin of mathematical biology.

[22]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[23]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[24]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[25]  Grit Herrmann,et al.  Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm , 1996, Comput. Appl. Biosci..

[26]  Bin Ma,et al.  Finding similar regions in many strings , 1999, STOC '99.

[27]  David R. Gilbert,et al.  Approaches to the Automatic Discovery of Patterns in Biosequences , 1998, J. Comput. Biol..