Mining Closed Sequential Patterns with Time Constraints

The mining of closed sequential patterns has attracted researchers for its capability of using compact results to preserving the same expressive power as traditional mining. Many studies have shown that constraints are essential for applications of sequential patterns. However, time constraints have not been incorporated into closed sequence mining yet. Therefore, we propose an algorithm called CTSP for closed sequential pattern mining with time constraints. CTSP loads the database into memory and constructs time-indexes to facilitate both pattern mining and closure checking, within the pattern growth framework. The index sets are utilized to efficiently mine the patterns without generating any candidate or sub-database. The bidirectional closure checking strategy further speeds up the mining. The comprehensive experiments with both synthetic and real datasets show that CTSP efficiently mines closed sequential patterns satisfying the time constraints, and has good linear scalability with respect to the database size.

[1]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[2]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[3]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[4]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[5]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Arbee L. P. Chen,et al.  An efficient algorithm for mining frequent sequences by a new strategy without support counting , 2004, Proceedings. 20th International Conference on Data Engineering.

[7]  Suh-Yin Lee,et al.  Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning , 2005, J. Inf. Sci. Eng..

[8]  Suh-Yin Lee,et al.  Efficient mining of sequential patterns with time constraints by delimited pattern growth , 2005, Knowledge and Information Systems.

[9]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  David A. Padua,et al.  Parallel mining of closed sequential patterns , 2005, KDD '05.

[12]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[13]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[14]  Salvatore Orlando,et al.  A new algorithm for gap constrained sequence mining , 2004, SAC '04.

[15]  Maguelonne Teisseire,et al.  Pre-processing time constraints for efficiently mining generalized sequential patterns , 2004, Proceedings. 11th International Symposium on Temporal Representation and Reasoning, 2004. TIME 2004..

[16]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[17]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[18]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[19]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[20]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.