Selection of views to materialize in a data warehouse

A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In This work, we have developed a theoretical framework for the general problem of selection of views in a data warehouse. We present polynomial-time heuristics for a selection of views to optimize total query response time under a disk-space constraint, for some important special cases of the general data warehouse scenario, viz.: 1) an AND view graph, where each query/view has a unique evaluation, e.g., when a multiple-query optimizer can be used to general a global evaluation plan for the queries, and 2) an OR view graph, in which any view can be computed from any one of its related views, e.g., data cubes. We present proofs showing that the algorithms are guaranteed to provide a solution that is fairly close to (within a constant factor ratio of) the optimal solution. We extend our heuristic to the general AND-OR view graphs. Finally, we address in detail the view-selection problem under the maintenance cost constraint and present provably competitive heuristics.

[1]  G. Nemhauser,et al.  Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[2]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Anthony C. Klug Processing Multiple Queries in Database Systems , 1982, IEEE Database Eng. Bull..

[4]  Nick Roussopoulos The Logical Access Path Schema of a Database , 1982, IEEE Transactions on Software Engineering.

[5]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[6]  Jaideep Srivastava,et al.  Multiple query optimization with Depth-First Branch-and-Bound and dynamic query ordering , 1993, CIKM '93.

[7]  W. H. Inmon,et al.  Rdb/VMS: Developing the Data Warehouse , 1993 .

[8]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[9]  Inderpal Singh Mumick,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[10]  Ashish Gupta,et al.  Generalized Projections: A Powerful Approach To Aggregation , 1995 .

[11]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[12]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[13]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[14]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[15]  Jennifer Widom,et al.  A System Prototype for Warehouse View Maintenance , 1996, VIEWS.

[16]  Uriel Feige A threshold of ln n for approximating set cover (preliminary version) , 1996, STOC '96.

[17]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[18]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[19]  Timos K. Sellis,et al.  Data Warehouse Configuration , 1997, VLDB.

[20]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[21]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[22]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[23]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[24]  Inderpal Singh Mumick,et al.  Maintenance of data cubes and summary tables in a warehouse , 1997, SIGMOD '97.

[25]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[26]  U. Feige A threshold of ln n for approximating set cover , 1998, JACM.

[27]  Latha S. Colby,et al.  Redbrick Vista: Aggregate Computation and Management , 1998, ICDE 1998.

[28]  Microsoft Index Tuning Wizard for SQL Server 7.0 , 1998, SIGMOD Conference.

[29]  Jian Yang,et al.  Genetic Algorithm for Materialized View Selection in Data Warehouse Environments , 1999, DaWaK.

[30]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[31]  Howard J. Karloff,et al.  On the complexity of the view-selection problem , 1999, PODS '99.

[32]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[33]  Minsoo Lee,et al.  Speeding Up Materialized View Selection in Data Warehouses Using a Randomized Algorithm , 2001, Int. J. Cooperative Inf. Syst..

[34]  Panos Kalnis,et al.  View selection using randomized search , 2002, Data Knowl. Eng..

[35]  Rada Chirkova,et al.  A formal perspective on the view selection problem , 2002, The VLDB Journal.

[36]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize in a Data Warehouse , 2005, IEEE Trans. Knowl. Data Eng..