Fast and Practical Algorithms for Planted (l, d) Motif Search

We consider the planted (I, d) motif search problem, which consists of finding a substring of length I that occurs in a set of input sequences {si,. ..,sn} with up to d errors, a problem that arises from the need to find transcription factor-binding sites in genomic information. We propose a sequence of practical algorithms, which start based on the ideas considered in PMS1. These algorithms are exact, have little space requirements, and are able to tackle challenging instances with bigger d, taking less time in the instances reported solved by exact algorithms. In particular, one of the proposed algorithms, PMSprune, is able to solve the challenging instances, such as (17, 6) and (19, 7), which were not previously reported as solved in the literature.

[1]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[2]  Sanguthevar Rajasekaran,et al.  Fast and Practical Algorithms for Planted (l, d) Motif Search , 2007, IEEE/ACM Transactions on Computational Biology & Bioinformatics.

[3]  Marie-France Sagot,et al.  A highly scalable algorithm for the extraction of CIS-regulatory regions , 2005, APBC.

[4]  Sriram Ramabhadran,et al.  Finding subtle motifs by branching from sample strings , 2003, ECCB.

[5]  Sanguthevar Rajasekaran,et al.  Exact Algorithms for Planted Motif Problems , 2005, J. Comput. Biol..

[6]  Sanguthevar Rajasekaran,et al.  Space and Time Efficient Algorithms for Planted Motif Search , 2006, International Conference on Computational Science.

[7]  Marie-France Sagot,et al.  RISOTTO: Fast Extraction of Motifs with Mismatches , 2006, LATIN.

[8]  Saurabh Sinha,et al.  A Statistical Method for Finding Transcription Factor Binding Sites , 2000, ISMB.

[9]  Francis Y. L. Chin,et al.  An Efficient Algorithm for String Motif Discovery , 2006, APBC.

[10]  Nan Li,et al.  Analysis of computational approaches for motif discovery , 2006, Algorithms for Molecular Biology.

[11]  Bin Ma,et al.  On the closest string and substring problems , 2002, JACM.

[12]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[13]  Francis Y. L. Chin,et al.  Voting algorithms for discovering long motifs , 2005, APBC.

[14]  Mathieu Blanchette,et al.  Algorithms for phylogenetic footprinting , 2001, RECOMB.

[15]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[16]  Marie-France Sagot,et al.  Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification , 2000, RECOMB '00.

[17]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[18]  Todd Wareham,et al.  On the complexity of finding common approximate substrings , 2003, Theor. Comput. Sci..

[19]  Jianer Chen,et al.  Integrating Sample-Driven and Pattern-Driven Approaches in Motif Finding , 2004, WABI.

[20]  Andrew D. Smith,et al.  Toward Optimal Motif Enumeration , 2003, WADS.

[21]  Saurabh Sinha,et al.  Performance comparison of algorithms for finding transcription factor binding sites , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[22]  Marie-France Sagot,et al.  Spelling Approximate Repeated or Common Motifs Using a Suffix Tree , 1998, LATIN.

[23]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.