SimpLiSMS: A Simple, Lightweight and Fast Approach for Structured Motifs Searching

A Structured Motif refers to a sequence of simple motifs with distance constraints. We present SimpLiSMS, a simple, lightweight and fast algorithm for searching structured motifs. SimpLiSMS does not use any sophisticated data structure, which makes it simple and lightweight. Our experiments show excellent performance of SimpLiSMS. Furthermore, we introduce a parallel version of SimpLiSMS which runs even faster.

[1]  Philipp Bucher,et al.  Mmsearch: a Motif Arrangement Language and Search Program , 2001, Bioinform..

[2]  Philip Bille,et al.  Regular expression matching with multi-strings and intervals , 2010, SODA '10.

[3]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[4]  Nematollaah Shiri,et al.  Fast Structured Motif Search in DNA Sequences , 2008, BIRD.

[5]  Gonzalo Navarro,et al.  Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching , 2003, J. Comput. Biol..

[6]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[7]  Solon P. Pissis,et al.  MoTeX-II: structured MoTif eXtraction from large-scale datasets , 2014, BMC Bioinformatics.

[8]  Costas S. Iliopoulos,et al.  Finding Patterns with Variable Length Gaps or Don't Cares , 2006, COCOON.

[9]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[10]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[11]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[12]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[13]  Yongqiang Zhang,et al.  SMOTIF: efficient structured pattern and profile motif search , 2006, Algorithms for Molecular Biology.

[14]  Gonzalo Navarro,et al.  Fast and simple character classes and bounded gaps pattern matching, with application to protein searching , 2001, RECOMB.

[15]  Nicola Vitacolonna,et al.  Structured motifs search. , 2005, Journal of computational biology : a journal of computational molecular cell biology.

[16]  Philip Bille,et al.  String matching with variable length gaps , 2012, Theor. Comput. Sci..

[17]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[18]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..