RISOTTO: Fast Extraction of Motifs with Mismatches

We present in this paper an exact algorithm for motif extraction. Efficiency is achieved by means of an improvement in the algorithm and data structures that applies to the whole class of motif inference algorithms based on suffix trees. An average case complexity analysis shows a gain over the best known exact algorithm for motif extraction. A full implementation was developed and made available online. Experimental results show that the proposed algorithm is more than two times faster than the best known exact algorithm for motif extraction.

[1]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[2]  M. Lothaire Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications) , 2005 .

[3]  Roded Sharan,et al.  CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments , 2003, ISMB.

[4]  Roded Sharan,et al.  A Discriminative Model for Identifying Spatial cis-Regulatory Modules , 2005, J. Comput. Biol..

[5]  Marie-France Sagot,et al.  Spelling Approximate Repeated or Common Motifs Using a Suffix Tree , 1998, LATIN.

[6]  Amar Mukherjee,et al.  New Algorithms for Finding Monad Patterns in DNA Sequences , 2004, SPIRE.

[7]  M. Lothaire,et al.  Applied Combinatorics on Words , 2005 .

[8]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[9]  Gary D. Stormo,et al.  Identifying target sites for cooperatively binding factors , 2001, Bioinform..

[10]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[11]  Maxime Crochemore,et al.  Pattern-matching and text-compression algorithms , 1996, CSUR.

[12]  Marie-France Sagot,et al.  A highly scalable algorithm for the extraction of CIS-regulatory regions , 2005, APBC.

[13]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[14]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[15]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.