The Query Clustering Problem: A Set Partitioning Approach

In this research, we address the query clustering problem which involves determining globally optimal execution strategies for a set of queries. The need to process a set of queries together often arises in deductive database systems, scientific database systems, large bibliographic retrieval systems and several other database applications. We address the optimization problem from the perspective of overlaps in data requirements, and model the batched operations using a set-partitioning approach. In this model, we first consider the case of m queries each involving a two-way join operation. We develop a recursive methodology to determine all the processing strategies in this case. Next, we establish certain dominance properties among the strategies, and develop exact as well as heuristic algorithms for selecting an appropriate strategy. We extend this analysis to a clustering approach, and outline a framework for optimizing multiway joins. The results show that the proposed approach is viable and efficient, and can easily be incorporated into the query processing component of most database systems.

[1]  Nick Roussopoulos,et al.  An incremental access method for ViewCache: concept, algorithms, and cost analysis , 1991, TODS.

[2]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[3]  Timos K. Sellis,et al.  On the Multiple-Query Optimization Problem , 1990, IEEE Trans. Knowl. Data Eng..

[4]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[5]  Arnon Rosenthal,et al.  Anatomy of a Mudular Multiple Query Optimizer , 1988, VLDB.

[6]  Carlo Zaniolo,et al.  Optimization of Nonrecursive Queries , 1986, VLDB.

[7]  Per-Åke Larson,et al.  Computing Queries from Derived Relations , 1985, VLDB.

[8]  Sheldon J. Finkelstein Common expression analysis in database applications , 1982, SIGMOD '82.

[9]  Won Kim,et al.  Global Optimization of Relational Queries: A First Step , 1985, Query Processing in Database Systems.

[10]  Jack Minker,et al.  Processing Multiple Queries in Database Systems. , 1982 .

[11]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[12]  John Grant,et al.  Optimization in Deductive and Conventional Relational Database Systems , 1979, Advances in Data Base Theory.

[13]  M. Goldstein,et al.  Multivariate Analysis: Methods and Applications , 1984 .

[14]  Masashi Tsuchida,et al.  Local and Global Query Optimization Mechanisms for Relational Databases , 1985, VLDB.

[15]  Donald Ervin Knuth,et al.  The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information , 1978 .

[16]  Matthias Jarke,et al.  Common Subexpression Isolation in Multiple Query Optimization , 1984, Query Processing in Database Systems.

[17]  Arie Segev,et al.  Using common subexpressions to optimize multiple queries , 1988, Proceedings. Fourth International Conference on Data Engineering.

[18]  Jack Minker,et al.  Logic and Data Bases , 1978, Springer US.