Adaptive, Automatic Stream Mining

Sensor devices and embedded processors are becoming widespread, especially in measurement/monitoring applications. Their limited resources (CPU, memory and/or communication bandwidth and power) pose some interesting challenges. We need concise, expressive models to represent the important features of the data, and lend themselves to efficient estimation. In particular, under these severe constraints, we want models and estimation methods which (a) require little memory and a single pass over the data, (b) can adapt and handle arbitrary periodic components, and (c) can deal with various types of noise. We propose AWSOM (Arbitrary Window Stream mOdeling Method), which allows sensors in remote or hostile environments to efficiently and effectively discover interesting patterns and trends. This can be done automatically, i.e., with no prior inspection of the data or any user intervention and expert tuning before or during data gathering. Our algorithms require limited resources and can thus be incorporated in sensors—possibly alongside a distributed query processing engine. Updates are performed in constant time with respect to stream size, using logarithmic space. Existing forecasting methods (SARIMA, GARCH, etc.) or “traditional” Fourier and wavelet analysis fall short on one or more of these requirements. To the best of our knowledge, AWSOM is the first framework that combines all of the above characteristics.

[1]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[2]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[3]  Christos Faloutsos,et al.  Prediction and indexing of moving objects with unknown motion patterns , 2004, SIGMOD '04.

[4]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[5]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[6]  Robert Szewczyk,et al.  System architecture directions for networked sensors , 2000, ASPLOS IX.

[7]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Robertus A. Zuidwijk,et al.  Fast algorithm for directional time-scale analysis using wavelets , 1998, Optics & Photonics.

[9]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[10]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[11]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[12]  R. Gencay,et al.  An Introduction to Wavelets and Other Filtering Methods in Finance and Economics , 2001 .

[13]  Sudipto Guha,et al.  Approximating a data stream for querying and estimation: algorithms and performance evaluation , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Ambuj K. Singh,et al.  SWAT: hierarchical stream summarization in large networks , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  Dimitrios Gunopulos,et al.  Temporal Aggregation over Data Streams Using Multiple Granularities , 2002, EDBT.

[16]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[17]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[18]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[19]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[20]  L. Richard Carley,et al.  MEMS-based integrated-circuit mass-storage systems , 2000, CACM.

[21]  Philippe Bonnet,et al.  Towards Sensor Database Systems , 2001, Mobile Data Management.

[22]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[23]  A. Walden,et al.  Wavelet Methods for Time Series Analysis , 2000 .

[24]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[25]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.

[26]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[27]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[28]  Richard A. Davis,et al.  Time Series: Theory and Methods (2Nd Edn) , 1993 .

[29]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[30]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[31]  Christos Faloutsos,et al.  Data Mining on an OLTP System (Nearly) for Free (CMU-CS-99-151) , 2000, SIGMOD 2000.

[32]  Peter C. Young,et al.  Recursive Estimation and Time-Series Analysis: An Introduction , 1984 .

[33]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..