TSP: mining top-K closed sequential patterns

Sequential pattern mining has been studied extensively in data mining community. Most previous studies require the specification of a minimum support threshold to perform the mining. However, it is difficult for users to provide an appropriate threshold in practice. To overcome this difficulty, we propose an alternative task: mining top-k frequent closed sequential patterns of length no less than min-l, where k is the desired number of closed sequential patterns to be mined, and minl, is the minimum length of each pattern. We mine closed patterns since they are compact representations of frequent patterns. We developed an efficient algorithm, called TSP, which makes use of the length constraint and the properties of top-k closed sequential patterns to perform dynamic support-raising and projected database-pruning. Our extensive performance study shows that TSP outperforms the closed sequential pattern mining algorithm even when the latter is running with the best tuned minimum support threshold.

[1]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[2]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[4]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[5]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[6]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[7]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[9]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[10]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[11]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).