WSpan: Weighted Sequential pattern mining in large sequence databases

Sequential pattern mining algorithms have been developed which mine the set of frequent subsequences satisfying a minimum support constraint in a sequence database. However, previous sequential mining algorithms treat sequential patterns uniformly while sequential patterns have different importance. Another main problem in most of the sequence mining algorithms is that they still generate an exponentially large number of sequential patterns when a minimum support is lowered and they do not provide alternative ways to adjust the number of sequential patterns other than increasing the minimum support. In this paper, we propose a weighted sequential pattern mining algorithm called WSpan. Our main approach is to push the weight constraints into the sequential pattern growth approach while maintaining the downward closure property. A weight range is defined to maintain the downward closure property and items are given different weights within the weight range. In scanning a sequence database, a maximum weight in the sequence database is used to prune weighted infrequent sequential patterns and in the mining step, maximum weights of projected sequence databases are used. By doing so, the downward closure property can be maintained. WSpan generates fewer but important weighted sequential patterns in large databases, particularly dense databases with a low minimum support, by adjusting a weight range

[1]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[2]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[3]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[4]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[5]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[6]  Xiang Zhang,et al.  A Top-Down Method for Mining Most-Specific Frequent Patterns in Biological Sequences , 2004, SDM.

[7]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[8]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[10]  John J. Leggett,et al.  WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight , 2005, SDM.

[11]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[13]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[14]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[16]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[17]  Martin Ester,et al.  A TOP-DOWN APPROACH FOR MINING MOST SPECIFIC FREQUENT PATTERNS IN BIOLOGICAL SEQUENCE DATA , 2003 .

[18]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[19]  John J. Leggett,et al.  WLPMiner: Weighted Frequent Pattern Mining with Length-Decreasing Support Constraints , 2005, PAKDD.

[20]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.