TKS: Efficient Mining of Top-K Sequential Patterns

Sequential pattern mining is a well-studied data mining task with wide applications. However, fine-tuning the minsup parameter of sequential pattern mining algorithms to generate enough patterns is difficult and time-consuming. To address this issue, the task of top-k sequential pattern mining has been defined, where k is the number of sequential patterns to be found, and is set by the user. In this paper, we present an efficient algorithm for this problem named TKS (Top-K Sequential pattern mining). TKS utilizes a vertical bitmap database representation, a novel data structure named PMAP (Precedence Map) and several efficient strategies to prune the search space. An extensive experimental study on real datasets shows that TKS outperforms TSP, the current state-of-the-art algorithm for top-k sequential pattern mining by more than an order of magnitude in execution time and memory.

[1]  Vincent S. Tseng,et al.  Mining Top-K Sequential Rules , 2011, ADMA.

[2]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[3]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[5]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[6]  Vincent S. Tseng,et al.  Mining Top-K Association Rules , 2012, Canadian Conference on AI.

[7]  Ming-Syan Chen,et al.  Mining top-k frequent patterns in the presence of the memory constraint , 2008, The VLDB Journal.

[8]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[11]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[12]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[13]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Philip S. Yu Editorial: State of the Transactions , 2004, IEEE Trans. Knowl. Data Eng..