Efficient frequent sequence mining by a dynamic strategy switching algorithm

Mining frequent sequences in large databases has been an important research topic. The main challenge of mining frequent sequences is the high processing cost due to the large amount of data. In this paper, we propose a novel strategy to find all the frequent sequences without having to compute the support counts of non-frequent sequences. The previous works prune candidate sequences based on the frequent sequences with shorter lengths, while our strategy prunes candidate sequences according to the non-frequent sequences with the same lengths. As a result, our strategy can cooperate with the previous works to achieve a better performance. We then identify three major strategies used in the previous works and combine them with our strategy into an efficient algorithm. The novelty of our algorithm lies in its ability to dynamically switch from a previous strategy to our new strategy in the mining process for a better performance. Experiment results show that our algorithm outperforms the previous ones under various parameter settings.

[1]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[2]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[3]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[5]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[6]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[7]  Mark Allen Weiss,et al.  Data structures and algorithm analysis in Ada , 1993 .

[8]  M. Teisseire,et al.  SPEED : Mining Maximal Sequential Patterns over Data Streams , 2022 .

[9]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[10]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[11]  Arbee L. P. Chen,et al.  Prediction of Web Page Accesses by Proxy Server Log , 2002, World Wide Web.

[12]  Mark Allen Weiss,et al.  Data structures and algorithm analysis in C , 1991 .

[13]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[14]  Kyuseok Shim,et al.  Mining Sequential Patterns with Regular Expression Constraints , 2002, IEEE Trans. Knowl. Data Eng..

[15]  Arbee L. P. Chen,et al.  Discovering nontrivial repeating patterns in music data , 2001, IEEE Trans. Multim..

[16]  David A. Padua,et al.  Parallel mining of closed sequential patterns , 2005, KDD '05.

[17]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[18]  Suh-Yin Lee,et al.  Incremental Mining of Sequential Patterns over a Stream Sliding Window , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[19]  M. Teisseire,et al.  SPEED : Mining Maxirnal Sequential Patterns over Data Strearns , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[20]  Chundong She,et al.  An improved parallel algorithm for sequence mining , 2005, IEEE International Conference Mechatronics and Automation, 2005.

[21]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Pierre-Yves Rolland FIExPat: flexible extraction of sequential patterns , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[23]  Victor J. Rayward-Smith,et al.  Determining a unique defining DNA sequence for yeast species using hashing techniques , 2002, Bioinform..

[24]  Rodger Staden,et al.  ZTR: a new format for DNA sequence trace data , 2002, Bioinform..

[25]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[26]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[27]  Arbee L. P. Chen,et al.  An efficient algorithm for mining frequent sequences by a new strategy without support counting , 2004, Proceedings. 20th International Conference on Data Engineering.

[28]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[29]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.