论文信息 - A skip-list approach for efficiently processing forecasting queries

A skip-list approach for efficiently processing forecasting queries

Time series data is common in many settings including scientific and financial applications. In these applications, the amount of data is often very large. We seek to support prediction queries over time series data. Prediction relies on model building which can be too expensive to be practical if it is based on a large number of data points. We propose to use statistical tests of hypotheses to choose a proper subset of data points to use for a given prediction query interval. This involves two steps: choosing a proper history length and choosing the number of data points to use within this history. Further, we use an I/O conscious skip list data structure to provide samples of the original data set. Based on the statistics collected for a query workload, which we model as a probability mass function (PMF) over query intervals, we devise a randomized algorithm that selects a set of pre-built models (PM's) to construct, subject to some maintenance cost constraint when there are updates. Given this set of PM's, we discuss interesting query processing strategies for not only point queries, but also range, aggregation, and JOIN queries. We conduct a comprehensive empirical study on real world datasets to verify the effectiveness of our approaches and algorithms.

Stanley B. Zdonik | Tingjian Ge | S. Zdonik | Tingjian Ge

[1] T. Bollerslev,et al. Forecasting financial market volatility: Sample frequency vis-a-vis forecast horizon , 1999 .

[2] Philip S. Yu,et al. Local Correlation Tracking in Time Series , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3] William Pugh,et al. Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[4] Christos Faloutsos,et al. Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[5] Jonathan Kirsch,et al. Load balancing and locality in range-queriable data structures , 2004, PODC '04.

[6] Goetz Graefe,et al. Algebraic Optimization of Computations over Scientific Databases , 1993, IEEE Data Eng. Bull..

[7] Adamantios Diamantopoulos,et al. Forecasting practice: A review of the empirical literature and an agenda for future research , 1996 .

[8] Samuel Madden,et al. PAQ: Time Series Forecasting for Approximate Query Answering in Sensor Networks , 2006, EWSN.

[9] Philip S. Yu,et al. Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[10] Dennis Shasha,et al. Query by Humming: a Time Series Database Approach , 2003, SIGMOD 2003.

[11] Clu-istos Foutsos,et al. Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[12] James Stewart,et al. Calculus: Concepts and Contexts , 1999 .

[13] Rob J Hyndman,et al. Minimum Sample Size requirements for Seasonal Forecasting Models , 2007 .

[14] Ittai Abraham,et al. Skip B-Trees , 2005, OPODIS.

[15] James Aspnes,et al. Skip graphs , 2003, SODA '03.

[16] Henry J. Schultz. The Sum of the kTh Powers of the First n Integers , 1980 .