Managing Dynamic Mixed Workloads for Operational Business Intelligence

As data warehousing technology gains a ubiquitous presence in business today, companies are becoming increasingly reliant upon the information contained in their data warehouses to inform their operational decisions. This information, known as business intelligence (BI), traditionally has taken the form of nightly or monthly reports and batched analytical queries that are run at specific times of day. However, as the time needed for data to migrate into data warehouses has decreased, and as the amount of data stored has increased, business intelligence has come to include metrics, streaming analysis, and reports with expected delivery times that are measured in hours, minutes, or seconds. The challenge is that in order to meet the necessary response times for these operational business intelligence queries, a given warehouse must be able to support at any given time multiple types of queries, possibly with different sets of performance objectives for each type. In this paper, we discuss why these dynamic mixed workloads make workload management for operational business intelligence (BI) databases so challenging, review current and proposed attempts to address these challenges, and describe our own approach. We have carried out an extensive set of experiments, and report on a few of our results.

[1]  Hsien-Hsin S. Lee,et al.  Constructing a Non-Linear Model with Neural Networks for Workload Characterization , 2006, 2006 IEEE International Symposium on Workload Characterization.

[2]  Surajit Chaudhuri,et al.  Estimating progress of execution for SQL queries , 2004, SIGMOD '04.

[3]  Harumi A. Kuno,et al.  Dynamic Workload Management for Very Large Data Warehouses: Juggling Feathers and Bowling Balls , 2007, VLDB.

[4]  Jeffrey F. Naughton,et al.  Increasing the accuracy and coverage of SQL progress indicators , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Jeffrey F. Naughton,et al.  Toward a progress indicator for database queries , 2004, SIGMOD '04.

[6]  Chetan Gupta,et al.  PQR: Predicting Query Execution Times for Autonomous Workload Management , 2008, 2008 International Conference on Autonomic Computing.

[7]  Volker Markl,et al.  LEO - DB2's LEarning Optimizer , 2001, VLDB.

[8]  WeikumGerhard,et al.  The COMFORT automatic tuning project , 1994 .

[9]  Kevin Wilkinson,et al.  Managing long-running queries , 2009, EDBT '09.

[10]  Said Elnaffar,et al.  Automatically classifying database workloads , 2002, CIKM '02.

[11]  Peter J. Haas,et al.  Statistical Learning Techniques for Costing XML Queries , 2005, VLDB.

[12]  Philip S. Yu,et al.  Multi-query SQL Progress Indicators , 2006, EDBT.

[13]  Kevin Wilkinson,et al.  Managing operational business intelligence workloads , 2009, OPSR.

[14]  David A. Patterson,et al.  Performance characterization of a Quad Pentium Pro SMP using OLTP workloads , 1998, ISCA.

[15]  Lieven Eeckhout,et al.  How input data sets change program behaviour , 2002, HPCA 2002.

[16]  Philip S. Yu,et al.  On Workload Characterization of Relational Database Environments , 1992, IEEE Trans. Software Eng..

[17]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[18]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  David J. DeWitt,et al.  Dynamic Memory Allocation for Multiple-Query Workloads , 1993, VLDB.

[20]  Winfried Lamersdorf,et al.  Service-Oriented Computing - ICSOC 2006, 4th International Conference, Chicago, IL, USA, December 4-7, 2006, Proceedings , 2006, ICSOC.

[21]  J. Chase,et al.  Data Center Workload Monitoring , Analysis , and Emulation , 2005 .

[22]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  Volker Markl,et al.  Learning table access cardinalities with LEO , 2002, SIGMOD '02.

[24]  Alfons Kemper,et al.  Quality of Service Enabled Database Applications , 2006, ICSOC.

[25]  Darcy G. Benoit Automated Diagnosis and Control of DBMS Resources , 2000, EDBT PhD Workshop.

[26]  Erich M. Nahum,et al.  Achieving Class-Based QoS for Transactional Workloads , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Hongjun Lu,et al.  Dynamic Task Allocation in a Distributed Database System , 1985, ICDCS.

[28]  Goetz Graefe,et al.  Dynamic resource brokering for multi-user query execution , 1995, SIGMOD '95.

[29]  Martin F. Arlitt,et al.  Characterizing Web user sessions , 2000, PERV.

[30]  Surajit Chaudhuri,et al.  When can we trust progress estimators for SQL queries? , 2005, SIGMOD '05.

[31]  Torsten Grust,et al.  Advances in database technology - EDBT 2006 : 10th International Conference on Extending Database Technology, Munich, Germany, March 2006; proceedings , 2006 .