Online summarization of dynamic time series data

Managing large-scale time series databases has attracted significant attention in the database community recently. Related fundamental problems such as dimensionality reduction, transformation, pattern mining, and similarity search have been studied extensively. Although the time series data are dynamic by nature, as in data streams, current solutions to these fundamental problems have been mostly for the static time series databases. In this paper, we first propose a framework to online summary generation for large-scale and dynamic time series data, such as data streams. Then, we propose online transform-based summarization techniques over data streams that can be updated in constant time and space. We present both the exact and approximate versions of the proposed techniques and provide error bounds for the approximate case. One of our main contributions in this paper is the extensive performance analysis. Our experiments carefully evaluate the quality of the online summaries for point, range, and k–nn queries using real-life dynamic data sets of substantial size.

[1]  Deok-Hwan Kim,et al.  Multi-dimensional selectivity estimation using compressed histogram information , 1999, SIGMOD '99.

[2]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[3]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[4]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[5]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[6]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[8]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[9]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[10]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[11]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[12]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[13]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[14]  Kari Karhunen,et al.  Über lineare Methoden in der Wahrscheinlichkeitsrechnung , 1947 .

[15]  Alberto O. Mendelzon,et al.  Efficient Retrieval of Similar Time Sequences Using DFT , 1998, FODO.

[16]  Michael J. Franklin,et al.  Streaming Queries over Streaming Data , 2002, VLDB.

[17]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[18]  Ian G. Cumming,et al.  The momentary Fourier transformation derived from recursive matrix transformations , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[19]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[20]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[21]  Robert H. Shumway,et al.  Time Series Analysis and Its Applications (Springer Texts in Statistics) , 2005 .

[22]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[23]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[24]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[25]  P. Levy Processus stochastiques et mouvement brownien , 1948 .

[26]  Jeffrey Scott Vitter,et al.  Dynamic Maintenance of Wavelet-Based Histograms , 2000, VLDB.

[27]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[28]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[29]  K. R. Rao,et al.  The Transform and Data Compression Handbook , 2000 .

[30]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[31]  Ambuj K. Singh,et al.  Efficient retrieval for browsing large image databases , 1996, CIKM '96.

[32]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[33]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[34]  Ambuj K. Singh,et al.  SWAT: hierarchical stream summarization in large networks , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[35]  David S. Stoffer,et al.  Time series analysis and its applications , 2000 .

[36]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[37]  Like Gao,et al.  Continually evaluating similarity-based pattern queries on a streaming time series , 2002, SIGMOD '02.

[38]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[39]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[40]  Rajeev Motwani,et al.  Sliding Window Computations over Data Streams , 2002 .

[41]  A. Papoulis Signal Analysis , 1977 .

[42]  J. Mendel Lessons in Estimation Theory for Signal Processing, Communications, and Control , 1995 .

[43]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[44]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[45]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[46]  J.K. Soh,et al.  A numerically-stable sliding-window estimator and its application to adaptive filters , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[47]  Ömer Egecioglu,et al.  Dimensionality Reduction and Similarity Computation by Inner-Product Approximations , 2004, IEEE Trans. Knowl. Data Eng..

[48]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[49]  R.N. Bracewell,et al.  Signal analysis , 1978, Proceedings of the IEEE.

[50]  Thomas Kailath,et al.  Modern signal processing , 1985 .

[51]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.