Exploiting Correlation and Parallelism of Materialized-View Recommendation for Distributed Data Warehouses

Many large enterprises require access to distributed data warehouses for business intelligence (BI) applications. Typically distributed data warehouses are integrated into a centralized data warehouse for the benefit of easy maintenance. However, this approach needs to overcome the complexity of data loading and job scheduling as well as scalability issues. On the other hand, the approach of a fully federated system may not be feasible for data intensive BI applications. The hybrid approach via intelligent data placement is more flexible and applicable than the centralized or full-federation configuration. The current implementation of the hybrid approach to integrating distributed data warehouses is to aggregate selected data from various remote sources as materialized views and cache them at the federation server to improve the performance of complex BI query workloads. In this paper, we propose an improvement that recommends materialized query tables (MQTs) for backend servers for the benefits of load distribution and easy maintenance of aggregated data in conjunction with the current hybrid approach of data placement. Our approach considers the correlation between backend servers and recommends MQTs that are well coordinated among the backend servers and optimized for a given workload. We also exploit the parallelism property among the backend servers to make our approach run almost linearly (in contrast to exponentially) with respect to the number of backend servers, without sacrificing its recommendation quality. Experimental evaluations validate the effectiveness and efficiency of our approach.

[1]  Jonathan Goldstein,et al.  MTCache: transparent mid-tier database caching in SQL server , 2004, Proceedings. 20th International Conference on Data Engineering.

[2]  Sriram Padmanabhan,et al.  DBProxy: a dynamic data cache for web applications , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[3]  Hamid Pirahesh,et al.  Recommending materialized views and indexes with the IBM DB2 design advisor , 2004 .

[4]  Times-Ten Team Mid-tier caching: the TimesTen approach , 2002, SIGMOD '02.

[5]  Laura M. Haas,et al.  Garlic: a new flavor of federated query processing for DB2 , 2002, SIGMOD '02.

[6]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[7]  Hamid Pirahesh,et al.  Cache Tables: Paving the Way for an Adaptive Database Cache , 2003, VLDB.

[8]  Hamid Pirahesh,et al.  Recommending materialized views and indexes with the IBM DB2 design advisor , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[9]  Wen-Syan Li,et al.  Load Balancing for Multi-tiered Database Systems through Autonomic Placement of Materialized Views , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[11]  Jonathan Goldstein,et al.  Relaxed currency and consistency: how to say "good enough" in SQL , 2004, SIGMOD '04.

[12]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.