A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database

Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In the context of sequential pattern mining, a large number of devoted techniques have been developed for solving particular classes of constraints. The aim of this paper is to investigate the use of Constraint Programming (CP) to model and mine sequential patterns in a sequence database. Our CP approach offers a natural way to simultaneously combine in a same framework a large set of constraints coming from various origins. Experiments show the feasibility and the interest of our approach.

[1]  Toby Walsh,et al.  Among, Common and Disjoint Constraints , 2005, CSCLP.

[2]  Kyuseok Shim,et al.  Mining Sequential Patterns with Regular Expression Constraints , 2002, IEEE Trans. Knowl. Data Eng..

[3]  Jian Pei,et al.  Sequence Data Mining , 2007, Advances in Database Systems.

[4]  Gilles Pesant,et al.  A Regular Language Membership Constraint for Finite Sequences of Variables , 2004, CP.

[5]  Patrice Boizumault,et al.  Constraint Programming for Mining n-ary Patterns , 2010, CP.

[6]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[7]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[8]  Chia-Hsiang Chang,et al.  From Regular Expressions to DFA's Using Compressed NFA's , 1992, CPM.

[9]  Emmanuel Coquery,et al.  A SAT-Based Approach for Discovering Frequent, Closed and Maximal Patterns in a Sequence , 2012, ECAI.

[10]  Nicolas Beldiceanu,et al.  Introducing global constraints in CHIP , 1994 .

[11]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[12]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[14]  Bruno Crémilleux,et al.  Discovering Linguistic Patterns Using Sequence Mining , 2012, CICLing.

[15]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[17]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[18]  Luc De Raedt,et al.  Constraint programming for itemset mining , 2008, KDD.

[19]  Patrice Boizumault,et al.  Soft Threshold Constraints for Pattern Mining , 2012, Discovery Science.

[20]  Luc De Raedt,et al.  k-Pattern Set Mining under Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Krzysztof R. Apt,et al.  Principles of constraint programming , 2003 .

[22]  Luc De Raedt,et al.  Itemset mining: A constraint programming perspective , 2011, Artif. Intell..

[23]  Patrice Boizumault,et al.  Découverte des soft-skypatterns avec une approche PPC , 2013, EGC.

[24]  Marie-Christine Jaulent,et al.  Sequential pattern mining to discover relations between genes and rare diseases , 2012, 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS).

[25]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.