A Randomized Approach for the Incremental Design of an Evolving Data Warehouse

A Data Warehouse (DW) can be used to integrate data from multiple distributed data sources. A DW can be seen as a set of materialized views that determine its schema and its content in terms of the schema and the content of the data sources. DW applications require high query performance. For this reason, the design of a typical DW consists of selecting views to materialize that are able to answer a set of input user queries. However, the cost of answering the queries has to be balanced against the cost of maintaining the materialized views. In an evolving DW application, new queries need to be answered by the DW. An incremental selection of materialized views uses the materialized views already in the DW to answer parts of the new queries, and avoids the re-implementation of the DW from scratch. This incremental design is complex and an exhaustive approach is not feasible. We have developed a randomized approach for incrementally selecting a set of views that are able to answer a set of input user queries locally while minimizing a combination of the query evaluation and view maintenance cost. In this process we exploit "common sub-expressions" among new queries and between new queries and old views. Our approach is implemented and we report on its experimental evaluation.

[1]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[2]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[3]  Dimitri Theodoratos Detecting redundant materialized views in data warehouse evolution , 2001, Inf. Syst..

[4]  Nick Roussopoulos,et al.  DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[5]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[6]  Jennifer Widom,et al.  Making views self-maintainable for data warehousing , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[7]  Timos K. Sellis,et al.  Designing Data Warehouses , 1999, Data Knowl. Eng..

[8]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[9]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[10]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[11]  Eugene Wong,et al.  Query optimization by simulated annealing , 1987, SIGMOD '87.

[12]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[13]  Timos K. Sellis,et al.  View selection for designing the global data warehouse , 2001, Data Knowl. Eng..

[14]  Werner Nutt,et al.  Rewriting aggregate queries using views , 1999, PODS.

[15]  Jennifer Widom,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[16]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[17]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[18]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[19]  Arun N. Swami,et al.  Optimization of large join queries , 1988, SIGMOD '88.

[20]  Timos K. Sellis,et al.  Data Warehouse Configuration , 1997, VLDB.

[21]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[22]  Dallan Quass,et al.  Maintenance Expressions for Views with Aggregation , 1996, VIEWS.

[23]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[24]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[25]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[26]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[27]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[28]  Guido Moerkotte,et al.  Heuristic and randomized optimization for the join ordering problem , 1997, The VLDB Journal.

[29]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.