Towards materialized view selection for distributed databases

Materialized views (MV) can significantly improve the query performance of relational databases. In this paper, we consider MVs to optimize complex scenarios where many heterogeneous nodes with different resource constraints (e.g., CPU, IO and network bandwidth) query and update numerous tables on different nodes. Such problems are typical for large enterprises, e.g., global retailers storing thousands of relations on hundreds of nodes at different subsidiaries. Choosing which views to materialize in a distributed, complex scenario is NP-hard. Furthermore, the solution space is huge, and the large number of input factors results in non-monotonic cost models. This prohibits the straightforward use of brute-force algorithms, greedy approaches or proposals from organic computing. For the same reason, all solutions for choosing MVs we are aware of do not consider either distributed settings or update costs. In this paper we describe an algorithmic framework which restricts the sets of considered MVs so that a genetic algorithm can be applied. In order to let the genetic algorithm converge quickly, we generate initial populations based on knowledge on database tuning, and devise a selection function which restricts the solution space by taking the similarity of MV configurations into account. We evaluate our approach both with artificial settings and a real-world RFID scenario from retail. For a small setting consisting of 24 tables distributed over 9 nodes, an exhaustive search needs 10 hours processing time. Our approach derives a comparable set of MVs within 30 seconds. Our approach scales well: Within 15 minutes it chooses a set of MVs for a real-world scenario consisting of 1,000 relations, 400 hosts, and a workload of 3,000 queries and updates.

[1]  Hamid Pirahesh,et al.  Recommending materialized views and indexes with the IBM DB2 design advisor , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[2]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[3]  Surajit Chaudhuri,et al.  Database Tuning Advisor for Microsoft SQL Server 2005 , 2004, VLDB.

[4]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[5]  Jian Yang,et al.  Genetic Algorithm for Materialized View Selection in Data Warehouse Environments , 1999, DaWaK.

[6]  Beng Chin Ooi,et al.  Towards self-tuning data placement in parallel database systems , 2000, SIGMOD '00.

[7]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[8]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[9]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[10]  Krithi Ramamritham,et al.  Materialized view selection and maintenance using multi-query optimization , 2000, SIGMOD '01.

[11]  Gang Zhou,et al.  Towards the Study of Performance Trade-offs Between Materialized and Virtual Integrated Views , 1996, VIEWS.

[12]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize in a Data Warehouse , 2005, IEEE Trans. Knowl. Data Eng..

[13]  Hamid Pirahesh,et al.  Recommending materialized views and indexes with the IBM DB2 design advisor , 2004 .

[14]  Zbigniew Michalewicz,et al.  Parameter Control in Evolutionary Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[15]  Donald Kossmann,et al.  Iterative dynamic programming: a new class of query optimization algorithms , 2000, TODS.

[16]  Zhenyu Liu,et al.  Extended Derivation Cube Based View Materialization Selection in Distributed Data Warehouse , 2005, WAIM.

[17]  Guy M. Lohman,et al.  R* optimizer validation and performance evaluation for local queries , 1986, SIGMOD '86.

[18]  Hicham G. Elmongui,et al.  Lazy Maintenance of Materialized Views , 2007, VLDB.

[19]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[20]  Mohamed Ziauddin,et al.  Materialized Views in Oracle , 1998, VLDB.

[21]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[22]  Tianxiao Liu Cost-Based Query Optimization in a Heterogeneous Distributed Semi-Structured Environment , 2007 .

[23]  Wen-Syan Li,et al.  Load balancing and data placement for multi-tiered database systems , 2007, Data Knowl. Eng..

[24]  Haifeng Jiang,et al.  Exploiting Correlation and Parallelism of Materialized-View Recommendation for Distributed Data Warehouses , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Surajit Chaudhuri,et al.  Database tuning advisor for microsoft SQL server 2005: demo , 2005, SIGMOD '05.

[26]  Maria E. Orlowska,et al.  Efficient refreshment of materialized views with multiple sources , 1999, CIKM '99.

[27]  Klemens Böhm,et al.  Tagmark: reliable estimations of RFID tags for business processes , 2008, KDD.

[28]  Luping Ding,et al.  Dynamic Materialized Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[29]  David Beasley,et al.  An overview of genetic algorithms: Part 1 , 1993 .

[30]  Wolfgang Lehner,et al.  On solving the view selection problem in distributed data warehouse architectures , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[31]  Kamalakar Karlapalem,et al.  On efficient storage space distribution among materialized views and indices in data warehousing environments , 2000, CIKM '00.