A Constraint Programming Approach for Enumerating Motifs in a Sequence

In this paper we propose a constraint programming approach for enumerating all frequent patterns with wildcards in a given sequence. To reduce the search space, we show that the anti-monotonicity property of frequent patterns can be dynamically encoded using no good recording based approach. Finally, the constraints network is encoded as a Boolean formula. This last step allows us to exploit the efficiency of modern SAT solvers and particularly their clauses learning component. Preliminary experiments on real world data show the feasibility of our approach in practice.

[1]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[2]  Sharad Malik,et al.  Efficient conflict driven learning in a Boolean satisfiability solver , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[3]  Adnan Darwiche,et al.  On the Power of Clause-Learning SAT Solvers with Restarts , 2009, CP.

[4]  Toby Walsh,et al.  Handbook of satisfiability , 2009 .

[5]  Cesare Tinelli,et al.  Handbook of Satisfiability , 2021, Handbook of Satisfiability.

[6]  Yuan Gao,et al.  Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm , 2000, SODA '00.

[7]  Olivier Roussel,et al.  New Encodings of Pseudo-Boolean Constraints into CNF , 2009, SAT.

[8]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[9]  Maxime Crochemore,et al.  Bases of motifs for generating repeated patterns with wild cards , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Joao Marques-Silva,et al.  GRASP-A new search algorithm for satisfiability , 1996, Proceedings of International Conference on Computer Aided Design.

[11]  Luc De Raedt,et al.  Constraint programming for itemset mining , 2008, KDD.

[12]  Terran Lane,et al.  Filtering Techniques for Rapid User Classification , 1998 .

[13]  Laxmi Parida,et al.  An Output-Sensitive Flexible Pattern Discovery Algorithm , 2001, CPM.

[14]  K. Sakallah,et al.  A New Search Algorithm for Satisfiability , 1996 .

[15]  Karem A. Sakallah,et al.  GRASP—a new search algorithm for satisfiability , 1996, ICCAD 1996.

[16]  Bart Selman,et al.  Backdoors To Typical Case Complexity , 2003, IJCAI.

[17]  Hiroki Arimura,et al.  An efficient polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence , 2007, J. Comb. Optim..