PQR: Predicting Query Execution Times for Autonomous Workload Management

Modern enterprise data warehouses have complex workloads that are notoriously difficult to manage. One of the key pieces to managing workloads is an estimate of how long a query will take to execute. An accurate estimate of this query execution time is critical to self managing Enterprise Class Data Warehouses. In this paper we study the problem of predicting the execution time of a query on a loaded data warehouse with a dynamically changing workload. We use a machine learning approach that takes the query plan, combines it with the observed load vector of the system and uses the new vector to predict the execution time of the query. The predictions are made as time ranges. We validate our solution using real databases and real workloads. We show experimentally that our machine learning approach works well. This technology is slated for incorporation into a commercial, enterprise class DBMS.

[1]  Alfons Kemper,et al.  Quality of Service Enabled Database Applications , 2006, ICSOC.

[2]  Füsun Özgüner,et al.  Run-time statistical estimation of task execution times for heterogeneous distributed computing , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[3]  M. Howard Williams,et al.  Analytical response time estimation in parallel relational database systems , 2004, Parallel Comput..

[4]  Kimmo E. E. Raatikainen,et al.  Cluster analysis and workload classification , 1993, PERV.

[5]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[6]  Volker Markl,et al.  LEO: An autonomic query optimizer for DB2 , 2003, IBM Syst. J..

[7]  Jeffrey S. Chase,et al.  Learning Application Models for Utility Resource Planning , 2006, 2006 IEEE International Conference on Autonomic Computing.

[8]  Gerhard Weikum,et al.  Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering , 2002, VLDB.

[9]  Volker Markl,et al.  LEO - DB2's LEarning Optimizer , 2001, VLDB.

[10]  Hamish Taylor,et al.  Analytical response time estimation in parallel relational database systems , 2004 .

[11]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[12]  Jayant R. Haritsa,et al.  Plan Selection Based on Query Clustering , 2002, VLDB.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .