A new framework for detecting weighted sequential patterns in large sequence databases

Sequential pattern mining is an essential research topic with broad applications which discovers the set of frequent subsequences satisfying a support threshold in a sequence database. The major problems of mining sequential patterns are that a huge set of sequential patterns are generated and the computation time is so high. Although efficient algorithms have been developed to tackle these problems, the performance of the algorithms dramatically degrades in case of mining long sequential patterns in dense databases or using low minimum supports. In addition, the algorithms may reduce the number of patterns but unimportant patterns are still found in the result patterns. It would be better if the unimportant patterns could be pruned first, resulting in fewer but important patterns after mining. In this paper, we suggest a new framework for mining weighted frequent patterns in which weight constraints are deeply pushed in sequential pattern mining. Previous sequential mining algorithms treat sequential patterns uniformly while real sequential patterns have different importance. In our approach, the weights of items are given according to the priority or importance. During the mining process, we consider not only supports but also weights of patterns. Based on the framework, we present a weighted sequential pattern mining algorithm (WSpan). To our knowledge, this is the first work to mine weighted sequential patterns. The experimental results show that WSpan detects fewer but important weighted sequential patterns in large sequence databases even with a low minimum threshold.

[1]  Philip S. Yu,et al.  Efficient mining of weighted association rules (WAR) , 2000, KDD '00.

[2]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[3]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[4]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[5]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[6]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[9]  Fionn Murtagh,et al.  Weighted Association Rule Mining using weighted support and significance framework , 2003, KDD '03.

[10]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[12]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[13]  Arbee L. P. Chen,et al.  An efficient algorithm for mining frequent sequences by a new strategy without support counting , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[15]  Unil Yun,et al.  Mining lossless closed frequent patterns with weight constraints , 2007, Knowl. Based Syst..

[16]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[17]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[18]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[19]  Jiawei Han,et al.  SeqIndex: Indexing Sequences by Sequential Pattern Analysis , 2005, SDM.

[20]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[21]  John J. Leggett,et al.  WLPMiner: Weighted Frequent Pattern Mining with Length-Decreasing Support Constraints , 2005, PAKDD.

[22]  Jean-François Boulicaut,et al.  Mining Frequent Sequential Patterns under Regular Expressions: A Highly Adaptive Strategy for Pushing Contraints , 2003, SDM.

[23]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[24]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[25]  Martin Ester,et al.  A TOP-DOWN APPROACH FOR MINING MOST SPECIFIC FREQUENT PATTERNS IN BIOLOGICAL SEQUENCE DATA , 2003 .

[26]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[27]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[28]  Jeffrey Xu Yu,et al.  Scalable sequential pattern mining for biological sequences , 2004, CIKM '04.

[29]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[30]  Unil Yun,et al.  Efficient mining of weighted interesting patterns with a strong weight and/or support affinity , 2007, Inf. Sci..