Efficient mining of sequential patterns with time constraints: Reducing the combinations

In this paper we consider the problem of discovering sequential patterns by handling time constraints as defined in the Gsp algorithm. While sequential patterns could be seen as temporal relationships between facts embedded in the database where considered facts are merely characteristics of individuals or observations of individual behavior, generalized sequential patterns aim to provide the end user with a more flexible handling of the transactions embedded in the database. We thus propose a new efficient algorithm, called Gtc (Graph for Time Constraints) for mining such patterns in very large databases. It is based on the idea that handling time constraints in the earlier stage of the data mining process can be highly beneficial. One of the most significant new feature of our approach is that handling of time constraints can be easily taken into account in traditional levelwise approaches since it is carried out prior to and separately from the counting step of a data sequence. Our test shows that the proposed algorithm performs significantly faster than a state-of-the-art sequence mining algorithm.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Kyuseok Shim,et al.  Mining Sequential Patterns with Regular Expression Constraints , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[7]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[8]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[9]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[10]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[11]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[12]  Jean-François Boulicaut,et al.  Constraint-Based Mining of Sequential Patterns over Datasets with Consecutive Repetitions , 2003, PKDD.

[13]  Dino Pedreschi,et al.  Mining sequences with temporal annotations , 2006, SAC '06.

[14]  Chia-Wen Chang,et al.  Mining Closed Sequential Patterns with Time Constraints , 2008, J. Inf. Sci. Eng..

[15]  Suh-Yin Lee,et al.  Efficient mining of sequential patterns with time constraints by delimited pattern growth , 2005, Knowledge and Information Systems.

[16]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[17]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[18]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[19]  Salvatore Orlando,et al.  A new algorithm for gap constrained sequence mining , 2004, SAC '04.

[20]  Tadeusz Morzy,et al.  Efficient Constraint-Based Sequential Pattern Mining Using Dataset Filtering Techniques , 2002, BalticDB&IS.

[21]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[22]  Jean-François Boulicaut,et al.  GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions , 2003, MLDM.

[23]  Maguelonne Teisseire,et al.  Pre-processing time constraints for efficiently mining generalized sequential patterns , 2004, Proceedings. 11th International Symposium on Temporal Representation and Reasoning, 2004. TIME 2004..