Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

As a typical data mining research topic, sequential pattern mining has been studied extensively for the past decade. Recently, mining various sequential patterns incrementally over stream data has raised great interest. Due to the challenges of mining stream data, many difficulties not so obvious in static data mining have to be reconsidered carefully. In this paper, we propose a novel algorithm which stores only frequent closed prefixes in its enumeration tree structure, used for mining and maintaining patterns in the current sliding window, to solve the frequent closed sequential pattern mining problem efficiently over stream data. Some effective search space pruning and pattern closure checking strategies have been also devised to accelerate the algorithm. Experimental results show that our algorithm outperforms other state-of-the-art algorithm significantly in both running time and memory use.

[1]  Ming Zhou,et al.  Detecting Erroneous Sentences using Automatically Mined Sequential Patterns , 2007, ACL.

[2]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[3]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[4]  Jinyan Li,et al.  Mining and Ranking Generators of Sequential Patterns , 2008, SDM.

[5]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[6]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[7]  Jianyong Wang,et al.  Efficient mining of frequent sequence generators , 2008, WWW.

[8]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[9]  Lei Chang,et al.  SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Jianyong Wang,et al.  Efficient itemset generator discovery over a stream sliding window , 2009, CIKM.

[11]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[13]  Siau-Cheng Khoo,et al.  Mining and Ranking Generators of Sequential Pattern , 2008, SDM 2008.

[14]  Ke Wang,et al.  Frequent-subsequence-based prediction of outer membrane proteins , 2003, KDD '03.

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  Jian Pei,et al.  MAPO: mining API usages from open source repositories , 2006, MSR '06.

[17]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[18]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[19]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[20]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[21]  Xiao Ma,et al.  CISpan: Comprehensive Incremental Mining Algorithms of Closed Sequential Patterns for Multi-Versional Software Mining , 2008, SDM.

[22]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[23]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.