SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows

Previous studies have shown mining closed patterns provides more benefits than mining the complete set of frequent patterns, since closed pattern mining leads to more compact results and more efficient algorithms. It is quite useful in a data stream environment where memory and computation power are major concerns. This paper studies the problem of mining closed sequential patterns over data stream sliding windows. A synopsis structure IST (Inverse Closed Sequence Tree) is designed to keep inverse closed sequential patterns in current window. An efficient algorithm SeqStream is developed to mine closed sequential patterns in stream windows incrementally, and various novel strategies are adopted in SeqStream to prune search space aggressively. Extensive experiments on both real and synthetic data sets show that SeqStream outperforms PrefixSpan, CloSpan and BIDE by a factor of about one to two orders of magnitude.

[1]  Xindong Wu,et al.  Sequential pattern mining in multiple streams , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[3]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[4]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[5]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[6]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Suh-Yin Lee,et al.  Incremental Mining of Sequential Patterns over a Stream Sliding Window , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[8]  Christie I. Ezeife,et al.  SSM : A Frequent Sequential Data Stream Patterns Miner , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[9]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Suh-Yin Lee,et al.  Incremental update on sequential patterns in large databases , 1998, Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No.98CH36294).

[11]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[12]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[13]  Maguelonne Teisseire,et al.  Need For Speed : Mining Sequential Patterns in Data Streams , 2005, BDA.

[14]  Ming-Syan Chen,et al.  On progressive sequential pattern mining , 2006, CIKM '06.

[15]  Florent Masseglia,et al.  Mining sequential patterns from data streams: a centroid approach , 2006, Journal of Intelligent Information Systems.

[16]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[17]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[18]  Won Suk Lee,et al.  Efficient mining method for retrieving sequential patterns over online data streams , 2005, J. Inf. Sci..

[19]  Maguelonne Teisseire,et al.  Incremental mining of sequential patterns in large databases , 2003, Data Knowl. Eng..

[20]  Chedy Raïssi,et al.  Sampling for Sequential Pattern Mining: From Static Databases to Data Streams , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Jia-Dong Ren,et al.  Mining Weighted Closed Sequential Patterns in Large Databases , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[22]  Ben Kao,et al.  Algorithms for Mining Frequent Sequences , 2003 .

[23]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[24]  Suh-Yin Lee,et al.  Incremental update on sequential patterns in large databases by implicit merging and efficient counting , 2004, Inf. Syst..

[25]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.