Cost-based optimization of decision support queries using transient-views

Next generation decision support applications, besides being capable of processing huge amounts of data, require the ability to integrate and reason over data from multiple, heterogeneous data sources. Often, these data sources differ in a variety of aspects such as their data models, the query languages they support, and their network protocols. Also, typically they are spread over a wide geographical area. The cost of processing decision support queries in such a setting is quite high. However, processing these queries often involves redundancies such as repeated access of same data source and multiple execution of similar processing sequences. Minimizing these redundancies would significantly reduce the query processing cost. In this paper, we (1) propose an architecture for processing complex decision support queries involving multiple, heterogeneous data sources; (2) introduce the notion of transient-views — materialized views that exist only in the context of execution of a query — that is useful for minimizing the redundancies involved in the execution of these queries; (3) develop a cost-based algorithm that takes a query plan as input and generates an optimal “covering plan”, by minimizing redundancies in the original plan; (4) validate our approach by means of an implementation of the algorithms and a detailed performance study based on TPC-D benchmark queries on a commercial database system; and finally, (5) compare and contrast our approach with work in related areas, in particular, the areas of answering queries using views and optimization using common sub-expressions. Our experiments demonstrate the practicality and usefulness of transient-views in significantly improving the performance of decision support queries.

[1]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[2]  Don Batory,et al.  Query Processing in Database Systems , 2011, Topics in Information Systems.

[3]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[4]  Nick Roussopoulos,et al.  View indexing in relational databases , 1982, TODS.

[5]  John Grant,et al.  Optimization in Deductive and Conventional Relational Database Systems , 1979, Advances in Data Base Theory.

[6]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[7]  Patrick A. V. Hall,et al.  Optimization of a Single Relation Expression in a Relational Data Base System , 1976, IBM J. Res. Dev..

[8]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[10]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[11]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[12]  Matthias Jarke,et al.  Common Subexpression Isolation in Multiple Query Optimization , 1984, Query Processing in Database Systems.

[13]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[14]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[15]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[16]  Guy M. Lohman,et al.  Query Optimization in the IBM DB2 Family. , 1993 .

[17]  Sheldon J. Finkelstein Common expression analysis in database applications , 1982, SIGMOD '82.

[18]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[19]  Nick Roussopoulos The Logical Access Path Schema of a Database , 1982, IEEE Transactions on Software Engineering.

[20]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[21]  N. Roussopolous The Logical Access Path Schema of a Database , 1982 .

[22]  Paolo Merialdo,et al.  Structures in the Web , 1997, Sistemi Evoluti per Basi di Dati.

[23]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[24]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[25]  Eugene Wong,et al.  Decomposition—a strategy for query processing , 1976, TODS.

[26]  Hamid Pirahesh,et al.  Extensible query processing in starburst , 1989, SIGMOD '89.

[27]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[28]  Weimin Du,et al.  The Pegasus heterogeneous multidatabase system , 1991, Computer.

[29]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[30]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[31]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[32]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[33]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[34]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.