Mining closed sequences with constraint based on BIDE algorithm

Mining sequential pattern is one of the common data mining task for many real-life applications. Previous existing algorithm such as CAMLS(Constraint-based Apriori Algorithm for Mining Long Sequences) mines the complete set of frequent sequences(Long) satisfying a min-sup threshold in a sequence. However, mining long sequences will generate an explosive number of frequent sequences, which is prohibitively costly in both run time and space storage. In this paper, we propose to improve CAMLS algorithm to produce only for closed sequences. Instead of mining full set of sequences, we plan to mine only short(closed) sequences. i.e., those containing, no super sequences with same support. Our motivation is to mine closed sequences from long sequences using BIDE algorithm with improved CAMLS algorithm and make the pruning strategy even more efficient. BIDE is an efficient algorithm for mining closed sequences which works under without candidate-maintenance and test paradigm.

[1]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[2]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[3]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[4]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Jia-Dong Ren,et al.  Mining Weighted Closed Sequential Patterns in Large Databases , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[7]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[8]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Binhua Liao An improved Apriori algorithm , 2009 .

[10]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[14]  Yongge Shi,et al.  An Improved Apriori Algorithm , 2010, 2010 IEEE International Conference on Granular Computing.

[15]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.