Mining sequential patterns from data streams: a centroid approach

In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.

[1]  Florent Masseglia,et al.  Web Usage Mining: Sequential Pattern Extraction with a Very Low Support , 2004, APWeb.

[2]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[3]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[4]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  Qingguo Zheng,et al.  When to Update the Sequential Patterns of Stream Data? , 2002, PAKDD.

[6]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[7]  Xindong Wu,et al.  Mining Sequential Patterns Across Data Streams , 2005 .

[8]  Won Suk Lee,et al.  Efficient mining method for retrieving sequential patterns over online data streams , 2005, J. Inf. Sci..

[9]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[10]  Maguelonne Teisseire,et al.  Need For Speed : Mining Sequential Patterns in Data Streams , 2005, BDA.

[11]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[12]  Florent Masseglia,et al.  An efficient algorithm for Web usage mining , 1999 .

[13]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[14]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[15]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[16]  Geert Wets,et al.  Web Usage Mining by Means of Multidimensional Sequence Alignment Methods , 2002, WEBKDD.

[17]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[18]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[20]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[21]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.