Cost-based query scrambling for initial delays

Remote data access from disparate sources across a wide-area network such as the Internet is problematic due to the unpredictable nature of the communications medium and the lack of knowledge about the load and potential delays at remote sites. Traditional, static, query processing approaches break down in this environment because they are unable to adapt in response to unexpected delays. Query scrambling has been proposed to address this problem. Scrambling modifies query execution plans on-the-fly when delays are encountered during runtime. In its original formulation, scrambling was based on simple heuristics, which although providing good performance in many cases, were also shown to be susceptible to problems resulting from bad scrambling decisions. In this paper we address these shortcomings by investigating ways to exploit query optimization technology to aid in making intelligent scrambling choices. We propose three different approaches to using query optimization for scrambling. These approaches vary, for example, in whether they optimize for total work or response-time, and whether they construct partial or complete alternative plans. Using a two-phase randomized query optimizer, a distributed query processing simulator, and a workload derived from queries of the TPCD benchmark, we evaluate these different approaches and compare their ability to cope with initial delays in accessing remote sources. The results show that cost-based scrambling can effectively hide initial delays, but that in the absence of good predictions of expected delay durations, there are fundamental tradeoffs between risk aversion and effectiveness.

[1]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[2]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[3]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[4]  Guy M. Lohman,et al.  R* optimizer validation and performance evaluation for local queries , 1986, SIGMOD '86.

[5]  Eugene Wong,et al.  Query optimization by simulated annealing , 1987, SIGMOD '87.

[6]  Clement T. Yu,et al.  Distributed query processing a multiple database system , 1989, IEEE J. Sel. Areas Commun..

[7]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[8]  Kurt P. Brown PRPL: A Database Workload Specification Language, v1.3 , 1992 .

[9]  Sumit Ganguly,et al.  Query optimization for parallel execution , 1992, SIGMOD '92.

[10]  G. Antoshenkov,et al.  Dynamic query optimization in Rdb/VMS , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[11]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[12]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[13]  Won Kim,et al.  Modern Database Systems: The Object Model, Interoperability, and Beyond , 1995, Modern Database Systems.

[14]  Weimin Du,et al.  Reducing multidatabase query response time by tree balancing , 1995, SIGMOD '95.

[15]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[16]  Laurent Amsaleg,et al.  Scrambling query plans to cope with unexpected delays , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[17]  Laura M. Haas,et al.  The Garlic project , 1996, SIGMOD '96.

[18]  David J. DeWitt,et al.  Of Objects and Databases: A Decade of Turmoil , 1996, VLDB.

[19]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[20]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[21]  Miron Livny,et al.  The Case for Enhanced Abstract Data Types , 1997, VLDB.

[22]  L. Amsaleg,et al.  Improving Responsiveness for Wide-Area Data Access. , 1997 .

[23]  A. Dogac,et al.  Dynamic Query Optimization in Multidatabases. , 1997 .