Physical database design for data warehouses

Data warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator will often materialize views defined on the local relations to support common or complicated queries. Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting views and of indexes to materialize to minimize the down time. We call this the view index selection (VIS) problem. We present an A* search based solution to the problem as well as rules of thumb. We also perform additional experiments to understand the space-time tradeoff as it applies to data warehouses.

[1]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[2]  Paolo Tiberio,et al.  Estimating the cost of updates in a relational database , 1985, TODS.

[3]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[4]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[5]  Eric N. Hanson,et al.  A performance comparison of the Rete and TREAT algorithms for testing database rule conditions , 1992, [1992] Eighth International Conference on Data Engineering.

[6]  Henk M. Blanken,et al.  On the Selection of Secondary Indices in Relational Databases , 1993, Data Knowl. Eng..

[7]  Charles L. Forgy,et al.  Rete: a fast algorithm for the many pattern/many object pattern match problem , 1991 .

[8]  Nick Roussopoulos,et al.  View indexing in relational databases , 1982, TODS.

[9]  Arie Segev,et al.  Updating Distributed Materialized Views , 1989, IEEE Trans. Knowl. Data Eng..

[10]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[11]  Eric N. Hanson,et al.  Rule condition testing and action execution in Ariel , 1992, SIGMOD '92.

[12]  Mireille Régnier,et al.  An Adaptive Algorithm for Incremental Evaluation of Production Rules in Databases , 1993, VLDB.

[13]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[14]  Arie Segev,et al.  Data Management for Large Rule Systems , 1991, VLDB.

[15]  Dennis Shasha,et al.  A Framework for Automating Physical Database Design , 1991, VLDB.

[16]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[17]  Daniel P. Miranker TREAT: a better match algorithm for AI production systems , 1987, AAAI 1987.

[18]  Per-Åke Larson,et al.  Updating derived relations: detecting irrelevant and autonomously computable updates , 1986, VLDB.

[19]  Guy M. Lohman,et al.  Index scans using a finite LRU buffer: a validated I/O model , 1989, ACM Trans. Database Syst..

[20]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[21]  Arie Segev,et al.  Optimal update policies for distributed materialized views , 1991 .

[22]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[23]  Marvin H. Solomon,et al.  The GMAP: a versatile tool for physical data independence , 1996, The VLDB Journal.