An Adaptive Probe-Based Technique to Optimize Join Queries in Distributed Internet Databases

An adaptive probe-based optimization technique is developed and demonstrated in the context of an Internet-based distributed database environment. More and more common are database systems, which are distributed across servers communicating via the Internet where a query at a given site might require data from remote sites. Optimizing the response time of such queries is a challenging task due to the unpredictability of server performance and network traffic at the time of data shipment; this may result in the selection of an expensive query plan using a static query optimizer. We constructed an experimental setup consisting of two servers running the same DBMS connected via the Internet. Concentrating on join queries, we demonstrate how a static query optimizer might choose an expensive plan by mistake. This is due to the lack of a priori knowledge of the run-time environment, inaccurate statistical assumptions in size estimation, and neglecting the cost of remote method invocation. These shortcomings are addressed collectively by proposing a probing mechanism. Furthermore, we extend our mechanism with an adaptive technique that detects sub-optimality of a plan during query execution and attempts to switch to the cheapest plan while avoiding redundant work and imposing little overhead. An implementation of our run-time optimization technique for join queries was constructed in the Java language and incorporated into an experimental setup. The results demonstrate the superiority of our probe-based optimization over a static optimization.

[1]  Frank Olken,et al.  Random Sampling from Databases , 1993 .

[2]  Peter Bodorik,et al.  Dynamic distributed query processing techniques , 1989, CSC '89.

[3]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[4]  Philip S. Yu,et al.  Interleaving a Join Sequence with Semijoins in Distributed Query Processing , 1992, IEEE Trans. Parallel Distributed Syst..

[5]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[6]  David J. DeWitt,et al.  The BUCKY object-relational benchmark , 1997, SIGMOD '97.

[7]  Vern Paxson,et al.  Measurements and analysis of end-to-end Internet dynamics , 1997 .

[8]  Per-Åke Larson,et al.  A query sampling method for estimating local cost parameters in a multidatabase system , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[9]  Sally Floyd,et al.  Why we don't know how to simulate the Internet , 1997, WSC '97.

[10]  G. Antoshenkov,et al.  Dynamic query optimization in Rdb/VMS , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[11]  P.J. Haas,et al.  Sampling-based selectivity estimation for joins using augmented frequent value statistics , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  Masatoshi Yoshikawa,et al.  Query processing for distributed databases using generalized semi-joins , 1982, SIGMOD '82.

[13]  GraefeGoetz,et al.  Optimization of dynamic query evaluation plans , 1994 .

[14]  Peter Bodorik,et al.  Deciding on Correct Distributed Query Processing , 1992, IEEE Trans. Knowl. Data Eng..

[15]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[16]  J. S. Riordon,et al.  Correcting execution of distributed queries , 1990, DPDS '90.

[17]  George Reese,et al.  Database Programming with JDBC and Java , 1997 .

[18]  Clement T. Yu,et al.  Priniples of Database Query Processing for Advanced Applications , 1997 .

[19]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[20]  Arbee L. P. Chen,et al.  Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems , 1984, IEEE Transactions on Computers.

[21]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .

[22]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[23]  Nick Roussopoulos,et al.  A Pipeline N-way Join Algorithm Based on the 2-way Semijoin Program , 1991, IEEE Trans. Knowl. Data Eng..