Cache investment: integrating query optimization and distributed data placement

Emerging distributed query-processing systems support flexible execution strategies in which each query can be run using a combination of data shipping and query shipping. As in any distributed environment, these systems can obtain tremendous performance and availability benefits by employing dynamic data caching. When flexible execution and dynamic caching are combined, however, a circular dependency arises: Caching occurs as a by-product of query operator placement, but query operator placement decisions are based on (cached) data location. The practical impact of this dependency is that query optimization decisions that appear valid on a per-query basis can actually cause suboptimal performance for all queries in the long run. To address this problem, we developed Cache Investment - a novel approach for integrating query optimization and data placement that looks beyond the performance of a single query. Cache Investment sometimes intentionally generates a “suboptimal” plan for a particular query in the interest of effecting a better data placement for subsequent queries. Cache Investment can be integrated into a distributed database system without changing the internals of the query optimizer. In this paper, we propose Cache Investment mechanisms and policies and analyze their performance. The analysis uses results from both an implementation on the SHORE storage manager and a detailed simulation model. Our results show that Cache Investment can significantly improve the overall performance of a system and demonstrate the trade-offs among various alternative policies.

[1]  Nick Roussopoulos,et al.  MOCHA: a self-extensible database middleware system for distributed data sources , 2000, SIGMOD '00.

[2]  July , 1890, The Hospital.

[3]  S. B. Yao,et al.  Approximating block accesses in database organizations , 1977, CACM.

[4]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[5]  Praveen Seshadri,et al.  Client-site query extensions , 1999, SIGMOD '99.

[6]  조수원 University of Maryland at College Park의 곤충학과 소개 , 1997 .

[7]  Hongjun Lu,et al.  Load balancing in a locally distributed DB system , 1986, SIGMOD '86.

[8]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[9]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[10]  Reudiger Buck-Emden,et al.  Sap R/3 System: A Client/Server Technology , 1996 .

[11]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[12]  Jack A. Orenstein,et al.  The ObjectStore database system , 1991, CACM.

[13]  Florian Matthes,et al.  SAP R/3: A Database Application System (Tutorial) , 1998, SIGMOD Conference.

[14]  A. Prasad Sistla,et al.  Data replication for mobile computers , 1994, SIGMOD '94.

[15]  Darrell Woelk,et al.  Query Processing in Distributed ORION , 1990, EDBT.

[16]  Gerhard Weikum,et al.  A cost-model-based online method for distributed caching , 1997, Proceedings 13th International Conference on Data Engineering.

[17]  Michael J. FranklinUniversity Cache Investment Strategies , 1997 .

[18]  Sushil Jajodia,et al.  Distributed algorithms for dynamic replication of data , 1992, PODS.

[19]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[20]  Guy M. Lohman,et al.  Grammar-like functional rules for representing query optimization alternatives , 1988, SIGMOD '88.

[21]  Wolfgang Effelsberg,et al.  Principles of database buffer management , 1984, TODS.

[22]  Peter Scheuermann,et al.  WATCHMAN : A Data Warehouse Intelligent Cache Manager , 1996, VLDB.

[23]  P. Krishnan,et al.  Practical prefetching via data compression , 1993 .

[24]  Rafael Alonso,et al.  Broadcast disks: data management for asymmetric communication environments , 1995, SIGMOD '95.

[25]  Michael Stonebraker,et al.  On rules, procedures, caching and views in database systems , 1994, SIGMOD 1994.

[26]  Gerhard Weikum,et al.  Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions , 1998, The VLDB Journal.

[27]  Björn Þór Jónsson,et al.  Performance tradeoffs for client-server query processing , 1996, SIGMOD '96.

[28]  Sushil Jajodia,et al.  An algorithm for dynamic data distribution , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[29]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[30]  Nick Roussopoulos,et al.  Principles and Techniques in the Design of ADMS± , 1986, Computer.

[31]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[32]  Florian Matthes,et al.  SAP R/3 (tutorial): a database application system , 1998, SIGMOD '98.

[33]  Nick Roussopoulos,et al.  MOCHA: a self-extensible database middleware system for distributed data sources , 2000, SIGMOD 2000.

[34]  Michael J. Franklin,et al.  Client Data Caching: A Foundation for High Performance Object Database Systems , 1996 .

[35]  M. Franklin,et al.  Global Memory Management in Client-Server DBMS Architectures , 1992 .

[36]  Gerhard Weikum,et al.  Data partitioning and load balancing in parallel disk systems , 1998, The VLDB Journal.

[37]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[38]  Michael J. Carey,et al.  Highly concurrent cache consistency for indices in client-server database systems , 1997, SIGMOD '97.

[39]  Bernhard Mitschang,et al.  Advanced data processing in KRISYS: modeling concepts, implementation techniques, and client/server issues , 1998, The VLDB Journal.

[40]  Ellis Horowitz,et al.  Fundamentals of Data Structures , 1984 .

[41]  Arthur M. Keller,et al.  A predicate-based caching scheme for client-server database architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[42]  Domenico Ferrari,et al.  Performance analysis of several back-end database architectures , 1986, TODS.

[43]  Guy M. Lman Grammar-like Functional Rules for Representing Query Optimization Alternatives , 1998 .

[44]  Hongjun Lu,et al.  Load Balancing in a Locally Distributed Database System , 1986, SIGMOD Conference.

[45]  Michael Stonebraker,et al.  Data replication in Mariposa , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[46]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[47]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[48]  Miron Livny,et al.  Transactional client-server cache consistency: alternatives and performance , 1997, TODS.

[49]  Stanley B. Zdonik,et al.  Fido: A Cache That Learns to Fetch , 1991, VLDB.

[50]  Nick Roussopoulos,et al.  The Implementation and Performance Evaluation of the ADMS Query Optimizer: Integrating Query Result Caching and Matching , 1994, EDBT.

[51]  Miron Livny,et al.  Global Memory Management in Client-Server Database Architectures , 1992, VLDB.