Optimising the distributed execution of join queries in polynomial time

Abstract It is proposed that an optimal strategy for executing a join query in a distributed database system may be computed in a time which is bounded by a polynomial function of the number of relations and the size parameters of the network. The solution so unveiled considers both the transmission costs and the processing costs incurred in delivering the required result to the user that issued the query. The query specifies that several relational tables are to be coalesced and presented to the appropriate user. Undertaking this task demands the utilisation of limited system resources, so that a strategy for fulfilling the request that imposes minimal cost to the system should be devised. Both the processor sites, and the communications links that interconnect them, are utilised; an optimal strategy is one that minimises a weighted sum of processing and data transmission costs. An integer linear programming model of this problem was originally proposed in [1]; however, no suggestion was given as to how this model might be efficiently solved. By extending the earlier analysis, the recursive nature of the join computation is revealed. Further investigations then produce a modified relationship amenable to algorithmic solution; the resultant procedure has polynomial time and space requirements.

[1]  Serge Abiteboul,et al.  Equivalence and optimization of relational transactions , 1988, JACM.

[2]  Maria E. Orlowska,et al.  Allocating relations in a distributed database system , 1995 .

[3]  D. J. Reid Optimal distributed execution of join queries , 1994 .

[4]  Catriel Beeri,et al.  Properties of acyclic database schemes , 1981, STOC '81.

[5]  E. F. Codd,et al.  Relational database: a practical foundation for productivity , 1982, CACM.

[6]  Alfred V. Aho,et al.  The theory of joins in relational data bases , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[7]  Nathan Goodman,et al.  Tree queries: a simple class of relational queries , 1982, TODS.

[8]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[9]  Averill M. Law,et al.  The art and theory of dynamic programming , 1977 .

[10]  D. J. Reid,et al.  Executing join queries in an uncertain distributed environment , 1995 .

[11]  S. Vajda,et al.  Integer Programming and Network Flows , 1970 .

[12]  Chihping Wang The complexity of processing tree queries in distributed databases , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[13]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[14]  Alan R. Hevner,et al.  Query Processing in Distributed Database System , 1979, IEEE Transactions on Software Engineering.

[15]  Laurence A. Wolsey,et al.  Generalized dynamic programming methods in integer programming , 1973, Math. Program..

[16]  Don Batory,et al.  Query Processing in Database Systems , 2011, Topics in Information Systems.

[17]  Hamdy A. Taha,et al.  Integer Programming: Theory, Applications, and Computations , 1975 .

[18]  D. J. Reid,et al.  Minimizing the response time of executing a join between fragmented relations in a distributed database system , 1997 .

[19]  Ronald Fagin,et al.  Degrees of acyclicity for hypergraphs and relational database schemes , 1983, JACM.

[20]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[21]  Jorma Rissanen,et al.  Independent components of relations , 1977, TODS.

[22]  D. J. Reid Incorporating processor costs in optimizing the distributed execution of join queries , 1994 .

[23]  Yahiko Kambayashi,et al.  Processing Cyclic Queries , 1985, Query Processing in Database Systems.

[24]  Maria E. Orlowska,et al.  The propagation of updates to relational tables in a distributed database system , 1996 .

[25]  D. J. Reid Evaluating multiple join queries in a distributed database system , 1995 .