Towards Online Multi-model Approximation of Time Series

The increasing use of sensor technology for various monitoring applications (e.g. air-pollution, traffic, climate-change, etc.) has led to an unprecedented volume of streaming data that has to be efficiently aggregated, stored and retrieved. Real-time model-based data approximation and filtering is a common solution for reducing the storage (and communication) overhead. However, the selection of the most efficient model depends on the characteristics of the data stream, namely rate, burstiness, data range, etc., which cannot be always known a priori for (mobile) sensors and they can even dynamically change. In this paper, we investigate the innovative concept of efficiently combining multiple approximation models in real-time. Our approach dynamically adapts to the properties of the data stream and approximates each data segment with the most suitable model. As experimentally proved, our multi-model approximation approach always produces fewer or equal data segments than those of the best individual model, and thus provably achieves higher data compression ratio than individual linear models.

[1]  Ji Wu,et al.  Towards integrated and efficient scientific sensor data processing: a database approach , 2009, EDBT '09.

[2]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[4]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Walid G. Aref,et al.  Online Piece-wise Linear Approximation of Numerical Streams with Precision Guarantees , 2009, Proc. VLDB Endow..

[6]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[7]  Wei Hong,et al.  Approximate Data Collection in Sensor Networks using Probabilistic Models , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Rajeev Rastogi,et al.  SPARTAN: a model-based semantic compression system for massive data tables , 2001, SIGMOD '01.

[9]  Richard T. Snodgrass,et al.  Developing Time-Oriented Database Applications in SQL , 1999 .

[10]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[11]  Riccardo Leonardi,et al.  Approximations of One-Dimensional Digital Signals Under the$l^infty$Norm , 2006, IEEE Transactions on Signal Processing.

[12]  Lidan Wang,et al.  Predictive Modeling-Based Data Collection in Wireless Sensor Networks , 2008, EWSN.

[13]  Samuel Madden,et al.  MauveDB: supporting model-based user views in database systems , 2006, SIGMOD Conference.

[14]  Jie Liu,et al.  GAMPS: compressing multi sensor data by grouping and amplitude scaling , 2009, SIGMOD Conference.

[15]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[16]  Sharad Mehrotra,et al.  Capturing sensor-generated time series with quality guarantees , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[17]  E IoannidisYannis,et al.  Improved histograms for selectivity estimation of range predicates , 1996 .

[18]  Samuel Madden,et al.  Querying continuous functions in a database system , 2008, SIGMOD Conference.

[19]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[20]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Silvia Santini,et al.  Adaptive model selection for time series prediction in wireless sensor networks , 2007, Signal Process..

[22]  Sudipto Guha,et al.  On the space–time of optimal, approximate and streaming algorithms for synopsis construction problems , 2008, The VLDB Journal.

[23]  Nick Roussopoulos,et al.  Compressing historical information in sensor networks , 2004, SIGMOD '04.