Optimal extraction of motif patterns in 2D

The combinatorial explosion of motif patterns occurring in 1D and 2D arrays leads to the consideration of special classes of motifs growing linearly with the size of the input array. Such motifs, called irredundant motifs, are able to succinctly represent all of the other motifs occurring in the same array within reasonable time and space bounds. In previous work irredundant motifs were extracted from 2D arrays in O(N^2log^2nloglogn) and O(N^3) time, where N is the size of the 2D input array and n is its largest dimension. In this paper, we present an algorithm to extract irredundant motifs from 2D arrays that is quadratic in the size of the input. The input is defined on a binary alphabet. It is shown that the algorithm is optimal and practically faster than the previous ones.

[1]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[2]  Laxmi Parida,et al.  Algorithmic techniques in computational genomics , 1998 .

[3]  Alberto Apostolico,et al.  Optimal Offline Extraction of Irredundant Motif Bases , 2007, COCOON.

[4]  Maxime Crochemore,et al.  Bases of motifs for generating repeated patterns with wild cards , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[6]  Alberto Apostolico,et al.  Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time , 2008, Theor. Comput. Sci..

[7]  Alberto Apostolico,et al.  Motif patterns in 2D , 2008, Theor. Comput. Sci..

[8]  David R. Gilbert,et al.  Approaches to the Automatic Discovery of Patterns in Biosequences , 1998, J. Comput. Biol..

[9]  I. Rigoutsos,et al.  The emergence of pattern discovery techniques in computational biology. , 2000, Metabolic engineering.

[10]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[11]  Alberto Apostolico,et al.  Incremental Paradigms of Motif Discovery , 2004, J. Comput. Biol..

[12]  Yuan Gao,et al.  Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm , 2000, SODA '00.