A unified framework for monitoring data streams in real time

Online monitoring of data streams poses a challenge in many data-centric applications, such as telecommunications networks, traffic management, trend-related analysis, Web-click streams, intrusion detection, and sensor networks. Mining techniques employed in these applications have to be efficient in terms of space usage and per-item processing time while providing a high quality of answers to (1) aggregate monitoring queries, such as finding surprising levels of a data stream, detecting bursts, and to (2) similarity queries, such as detecting correlations and finding interesting patterns. The most important aspect of these tasks is their need for flexible query lengths, i.e., it is difficult to set the appropriate lengths a priori. For example, bursts of events can occur at variable temporal modalities from hours to days to weeks. Correlated trends can occur at various temporal scales. The system has to discover "interesting" behavior online and monitor over flexible window sizes. In this paper, we propose a multi-resolution indexing scheme, which handles variable length queries efficiently. We demonstrate the effectiveness of our framework over existing techniques through an extensive set of experiments.

[1]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[2]  Yang-Sae Moon,et al.  General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.

[3]  Matthew Denny,et al.  Adaptive execution of variable-accuracy functions , 2006, VLDB.

[4]  Alexander S. Szalay,et al.  Spatial Indexing of Large Multidimensional Databases , 2012, CIDR.

[5]  Yixin Chen,et al.  Online Analytical Processing Stream Data: Is It Feasible? , 2002, DMKD.

[6]  Christos Faloutsos,et al.  AWSOM: Adaptive, Hands-Off Stream Mining , 2003 .

[7]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[8]  S. Mallat A wavelet tour of signal processing , 1998 .

[9]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[10]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[11]  Ambuj K. Singh,et al.  SWAT: hierarchical stream summarization in large networks , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Donghui Zhang,et al.  Online event-driven subsequence matching over financial data streams , 2004, SIGMOD '04.

[13]  Bruce W. Weide,et al.  Optimal Expected-Time Algorithms for Closest Point Problems , 1980, TOMS.

[14]  Mong-Li Lee,et al.  Supporting Frequent Updates in R-Trees: A Bottom-Up Approach , 2003, VLDB.

[15]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[16]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[17]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[18]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[20]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[21]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[22]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[23]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[24]  Ambuj K. Singh,et al.  Variable length queries for time series data , 2001, Proceedings 17th International Conference on Data Engineering.