Requirement-Based Query and Update Scheduling in Real-Time Data Warehouses

A typical real-time data warehouse continually receives readonly queries from users and write-only updates from a variety of external sources. Queries may conflict with updates due to the resource competition and high loads. Moreover, users expect short response time for queries and low staleness for the query results. This makes it challenging to satisfy the two requirements simultaneously. This paper proposes a requirement-based querying and updating scheduling algorithm (RQUS) which allows users to express their real needs for their queries by specifying the acceptable response time delay and the acceptable result staleness when queries are submitted. RQUS dynamically adjusts the work mode of the system according to the changing requirements of users in order to allocate system resource to queries or updates and then prioritizes the query or update queue according to the work mode. And a freshness monitor is adopted to monitor the execution state of updating tasks in order to maintain the global table incrementally. Experimental results show that RQUS algorithm performs better than the three traditional scheduling algorithms with the changing user requirements overall.

[1]  Chetan Gupta,et al.  rFEED: A Mixed Workload Scheduler for Enterprise Data Warehouses , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[2]  Alexandros Labrinidis,et al.  Preference-Aware Query and Update Scheduling in Web-databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Daniel Mossé,et al.  UNIT: User-centric Transaction Management in Web-Database Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Ge Yu,et al.  Priority-Based Balance Scheduling in Real-Time Data Warehouse , 2009, 2009 Ninth International Conference on Hybrid Intelligent Systems.

[5]  Raghunath Othayoth Nambiar,et al.  The making of TPC-DS , 2006, VLDB.

[6]  Theodore Johnson,et al.  Scheduling Updates in a Real-Time Stream Warehouse , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Alexandros Labrinidis,et al.  Update Propagation Strategies for Improving the Quality of Data on the Web , 2001, VLDB.

[8]  Wolfgang Lehner,et al.  Partition-based workload scheduling in living data warehouse environments , 2007, DOLAP '07.

[9]  Alan Sussman,et al.  Multiple query scheduling for distributed semantic caches , 2010, J. Parallel Distributed Comput..

[10]  A. Burns,et al.  Scheduling hard real-time systems: a review , 1991, Softw. Eng. J..