Discovering interesting sequential pattern in large sequence database

Sequential pattern mining is an important data mining problem with broad applications. Most previous sequential mining algorithms generate an exponentially large number of sequential patterns. In addition, all items and sequences are treated uniformly. It would be better if the unimportant patterns could be pruned first, resulting in fewer but important patterns after mining. In this paper, we suggest a new algorithm for mining interesting sequential patterns. On the one hand, the resulting patterns are maximal which reduce the number of discovered sequences. On the other hand, weights are used to discover only important sequential patterns. To enhance the miming efficiency, it is proved that the downward closure property of frequent pattern is also retained in the proposed algorithm. Experimental results show that the algorithm is efficient and effective.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[3]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[4]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[5]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[6]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.