STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows

Sensor devices are becoming ubiquitous, especially in measurement and monitoring applications. Because of the real-time, append-only and semi-infinite natures of the generated sensor data streams, an online incremental approach is a necessity for mining stream data types. In this paper, we propose STAGGER: a one-pass, online and incremental algorithm for mining periodic patterns in data streams. STAGGER does not require that the user pre-specify the periodicity rate of the data. Instead, STAGGER discovers the potential periodicity rates. STAGGER maintains multiple expanding sliding windows staggered over the stream, where computations are shared among the multiple overlapping windows. Small-length sliding windows are imperative for early and real-time output, yet are limited to discover short periodicity rates. As streamed data arrives continuously, the sliding windows expand in length in order to cover the whole stream. Larger-length sliding windows are able to discover longer periodicity rates. STAGGER incrementally maintains a tree-like data structure for the frequent periodic patterns of each discovered potential periodicity rate. In contrast to the Fourier/Wavelet-based approaches used for discovering periodicity rates, STAGGER not only discovers a wider, more accurate set of periodicities, but also discovers the periodic patterns themselves. In fact, experimental results with real and synthetic data sets show that STAGGER outperforms Fourier/Wavelet-based approaches by an order of magnitude in terms of the accuracy of the discovered periodicity rates. Moreover, real-data experiments demonstrate the practicality of the discovered periodic patterns.

[1]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[2]  Walid G. Aref,et al.  Periodicity detection in time series databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[5]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[8]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[9]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[10]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[11]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[12]  Walid G. Aref,et al.  Using Convolution to Mine Obscure Periodic Patterns in One Pass , 2004, EDBT.

[13]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[14]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  M. D. MacLaren The Art of Computer Programming. Volume 2: Seminumerical Algorithms (Donald E. Knuth) , 1970 .

[16]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules and sequential patterns , 1996 .

[17]  V. Altuzar,et al.  Atmospheric pollution profiles in Mexico City in two different seasons , 2003 .

[18]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[19]  Jiawei Han,et al.  Mining Segment-Wise Periodic Patterns in Time-Related Databases , 1998, KDD.

[20]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[21]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[22]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[23]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[24]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[25]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[26]  Christos Faloutsos,et al.  AWSOM: Adaptive, Hands-Off Stream Mining , 2003 .

[27]  Walid G. Aref,et al.  Incremental, online, and merge mining of partial periodic patterns in time-series databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28]  Eamonn J. Keogh,et al.  Segmenting Time Series: A Survey and Novel Approach , 2002 .

[29]  Like Gao,et al.  Continually evaluating similarity-based pattern queries on a streaming time series , 2002, SIGMOD '02.

[30]  C. Finney,et al.  A review of symbolic analysis of experimental data , 2003 .

[31]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[32]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[33]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[34]  Philip S. Yu,et al.  Mining asynchronous periodic patterns in time series data , 2000, KDD '00.

[35]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[36]  JOSEPH M. Smith,et al.  ICD therapy for the prevention of sudden cardiac death in post-MI patients , 2003, Current treatment options in cardiovascular medicine.