PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth

Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of A priori which may substantially reduce the number of combinations to be examined. Howeve6 Apriori still encounters problems when a sequence database is large andor when sequential patterns to be mined are numerous ano we propose a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected - Ettern_ mining), which explores prejxprojection in sequential pattern mining. Prefixspan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover; prefi-projection substantially reduces the size of projected databases and leads to efJicient processing. Our performance study shows that Prefixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.

[1]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[2]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[4]  Sushil Jajodia,et al.  Mining Temporal Relationships with Multiple Granularities in Time Sequences , 1998, IEEE Data Eng. Bull..

[5]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[6]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[11]  Sourav S. Bhowmick,et al.  Sequential Pattern Mining: A Survey , 2003 .