Materializing views with minimal size to answer queries

In this paper we study the following problem. Given a database and a set of queries, we want to find, in advance, a set of views that can compute the answers to the queries, such that the size of the viewset (i.e., the amount of space, in bytes, required to store the viewset) is minimal on the given database. This problem is important for many applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results differ significantly depending on whether the workload queries have self-joins. If queries can have self-joins, then a disjunctive viewset can be a better solution than any set of conjunctive views. We show that the problem of finding a minimal-size disjunctive viewset is decidable, and give an upper bound on its complexity. If workload queries cannot have self-joins, there is no need to consider disjunctive viewsets, and we show that the problem is in NP. We describe a very compact search space of conjunctive views, which contains all views in at least one optimal disjunctive viewset. We give a dynamic-programming algorithm for finding minimal-size disjunctive viewsets for queries without self-joins, and discuss heuristics to make the algorithm efficient.

[1]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[2]  Diego Calvanese,et al.  Answering regular path queries using views , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Mohamed Ziauddin,et al.  Materialized Views in Oracle , 1998, VLDB.

[4]  Timos K. Sellis,et al.  Data Warehouse Configuration , 1997, VLDB.

[5]  Hamid Pirahesh,et al.  Answering complex SQL queries using automatic summary tables , 2000, SIGMOD '00.

[6]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[7]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[8]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[9]  Chen Li,et al.  Minimizing View Sets without Losing Query-Answering Power , 2001, ICDT.

[10]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[11]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize in a Data Warehouse , 2005, IEEE Trans. Knowl. Data Eng..

[12]  Chen Li,et al.  Generating efficient plans for queries using views , 2001, SIGMOD '01.

[13]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[14]  Hakan Hacigümüs,et al.  Providing database as a service , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Rada Chirkova,et al.  Minimizing Data-Communication Costs by Decomposing Query Results in Client-Server Environments , 2003 .

[16]  Timos K. Sellis,et al.  Designing the Global Data Warehouse with SPJ Views , 1999, CAiSE.

[17]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[18]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[19]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[20]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[21]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[22]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[23]  Rada Chirkova,et al.  The view-selection problem has an exponential-time lower bound for conjunctive queries and views , 2002, PODS '02.

[24]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[25]  Rada Chirkova,et al.  A formal perspective on the view selection problem , 2002, The VLDB Journal.

[26]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[27]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[28]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[29]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[30]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[31]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[32]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[33]  Rada Chirkova,et al.  Linearly Bounded Reformulations of Conjunctive Databases , 2000, Computational Logic.

[34]  Praveen Seshadri,et al.  An algebraic compression framework for query results , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[35]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[36]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .

[37]  Diego Calvanese,et al.  Answering Queries Using Views over Description Logics Knowledge Bases , 2000, AAAI/IAAI.

[38]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.