View selection using randomized search

An important issue in data warehouse development is the selection of a set of views to materialize in order to accelerate On-line analytical processing queries, given certain space and maintenance time constraints. Existing methods provide good results but their high execution cost limits their applicability for large problems. In this paper, we explore the application of randomized, local search algorithms to the view selection problem. The efficiency of the proposed techniques is evaluated using synthetic datasets, which cover a wide range of data and query distributions. The results show that randomized search methods provide near-optimal solutions in limited time, being robust to data and query skew. Furthermore, they can be easily adapted for various versions of the problem, including the simultaneous existence of size and time constraints, and view selection in dynamic environments. The proposed heuristics scale well with the problem size, and are therefore particularly useful for real life warehouses, which need to be analyzed by numerous business perspectives.

[1]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[2]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[3]  Inderpal Singh Mumick,et al.  Maintenance Of Materialized Views , 1999 .

[4]  Abraham Silberschatz,et al.  View maintenance issues for the chronicle data model (extended abstract) , 1995, PODS.

[5]  Howard J. Karloff,et al.  On the complexity of the view-selection problem , 1999, PODS '99.

[6]  Inderpal Singh Mumick,et al.  Maintenance of data cubes and summary tables in a warehouse , 1997, SIGMOD '97.

[7]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[8]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[9]  Sartaj Sahni,et al.  Simulated Annealing and Combinatorial Optimization , 1986, DAC 1986.

[10]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .

[11]  Timos K. Sellis,et al.  Data Warehouse Configuration , 1997, VLDB.

[12]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[13]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[14]  Arun N. Swami,et al.  Optimization of large join queries , 1988, SIGMOD '88.

[15]  Jeffrey F. Naughton,et al.  Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies , 1996, VLDB.

[16]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[17]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[18]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[19]  Hector Garcia-Molina,et al.  Shrinking the warehouse update Window , 1999, SIGMOD '99.

[20]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[21]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[22]  Nick Roussopoulos,et al.  DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[23]  Yannis E. Ioannidis,et al.  Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization , 1991, SIGMOD '91.

[24]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[25]  Martin L. Kersten,et al.  Fast, Randomized Join-Order Selection - Why Use Transformations? , 1994, VLDB.