Processing Forecasting Queries

Forecasting future events based on historic data is useful in many domains like system management, adaptive query processing, environmental monitoring, and financial planning. We describe the Fa system where users and applications can pose declarative forecasting queries---both one-time queries and continuous queries---and get forecasts in real-time along with accuracy estimates. Fa supports efficient algorithms to generate execution plans automatically for forecasting queries from a novel plan space comprising operators for transforming data, learning statistical models from data, and doing inference using the learned models. In addition, Fa supports adaptive query-processing algorithms that adapt plans for continuous forecasting queries to the time-varying properties of input data streams. We report an extensive experimental evaluation of Fa using synthetic datasets, datasets collected on a testbed, and two real datasets from production settings. Our experiments give interesting insights on plans for forecasting queries, and demonstrate the effectiveness and scalability of our plan-selection algorithms.

[1]  Moisés Goldszmidt,et al.  Short term performance forecasting in enterprise systems , 2005, KDD '05.

[2]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[4]  Kishor S. Trivedi,et al.  A comprehensive model for software rejuvenation , 2005, IEEE Transactions on Dependable and Secure Computing.

[5]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[6]  Jennifer Widom,et al.  Adaptive query processing in data stream management systems , 2005 .

[7]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[8]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Anand Sivasubramaniam,et al.  Critical event prediction for proactive management in large-scale computer clusters , 2003, KDD '03.

[10]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[11]  Samuel Madden,et al.  MauveDB: supporting model-based user views in database systems , 2006, SIGMOD Conference.

[12]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  Sunita Sarawagi,et al.  Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications , 1998, SIGMOD '98.

[15]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[16]  Rob J Hyndman,et al.  25 Years of Iif Time Series Forecasting: A Selective Review , 2005 .

[17]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[18]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.