Apollo: Learning Query Correlations for Predictive Caching in Geo-Distributed Systems

The performance of modern geo-distributed database applications is increasingly dependent on remote access latencies. Systems that cache query results to bring data closer to clients are gaining popularity, but they do not dynamically learn and exploit access patterns in client workloads. We present a novel prediction framework that identifies and makes use of workload characteristics obtained from data access patterns to exploit query relationships within an application’s database workload. We have designed and implemented this framework as Apollo, a system that learns query patterns and adaptively uses them to predict future queries and cache their results. Through extensive experimentation with two different benchmarks, we show that Apollo provides significant performance gains over popular caching solutions through reduced query response time. Our experiments demonstrate Apollo’s robustness and scalability as a predictive cache for geo-distributed database applications.

[1]  Arthur M. Keller,et al.  A predicate-based caching scheme for client-server database architectures , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[2]  Krithi Ramamritham,et al.  A Comparative Study of Alternative Middle Tier Caching Solutions to Support Dynamic Web Content Acceleration , 2001, VLDB.

[3]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[4]  Kenneth Salem,et al.  Optimization of query streams using semantic prefetching , 2005, TODS.

[5]  Mahadev Satyanarayanan,et al.  Cloudlets: at the leading edge of mobile-cloud convergence , 2014, 6th International Conference on Mobile Computing, Applications and Services.

[6]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  Michael J. Cafarella,et al.  Database Learning: Toward a Database that Becomes Smarter Every Time , 2017, SIGMOD Conference.

[8]  Stanley B. Zdonik,et al.  On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems , 2011, Proc. VLDB Endow..

[9]  Nick Roussopoulos,et al.  DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[10]  Kenneth Salem,et al.  Lazy database replication with ordering guarantees , 2004, Proceedings. 20th International Conference on Data Engineering.

[11]  Sriram Padmanabhan,et al.  DBProxy: a dynamic data cache for web applications , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Carsten Sapia,et al.  PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems , 2000, DaWaK.

[13]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize in a Data Warehouse , 2005, IEEE Trans. Knowl. Data Eng..

[14]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[15]  Ensar Gul,et al.  Neural network-based approaches for predicting query response times , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[16]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[17]  Xin Chen,et al.  F1: the fault-tolerant distributed RDBMS supporting google's ad business , 2012, SIGMOD Conference.

[18]  Bodepudi Sai Purna Chand Holistic Optimization By Prefetching Query results , 2018 .

[19]  Mong-Li Lee,et al.  Efficient Mining of XML Query Patterns for Caching , 2003, VLDB.

[20]  Paramvir Bahl,et al.  The Case for VM-Based Cloudlets in Mobile Computing , 2009, IEEE Pervasive Computing.

[21]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[22]  Rada Chirkova,et al.  A formal perspective on the view selection problem , 2002, The VLDB Journal.

[23]  Arun Iyengar,et al.  A Middleware System Which Intelligently Caches Query Results , 2000, Middleware.

[24]  Peter Scheuermann,et al.  WATCHMAN : A Data Warehouse Intelligent Cache Manager , 1996, VLDB.

[25]  Jonathan Goldstein,et al.  MTCache: transparent mid-tier database caching in SQL server , 2004, Proceedings. 20th International Conference on Data Engineering.

[26]  Peter Triantafillou,et al.  Efficient Scalable Accurate Regression Queries in In-DBMS Analytics , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[27]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[28]  Marc Holze,et al.  Towards workload shift detection and prediction for autonomic databases , 2007, PIKM '07.

[29]  Evaggelia Pitoura,et al.  Cooperative XPath caching , 2008, SIGMOD Conference.

[30]  Stanley B. Zdonik,et al.  Fido: A Cache That Learns to Fetch , 1991, VLDB.

[31]  Dan Suciu,et al.  Query Caching and View Selection for XML Databases , 2005, VLDB.

[32]  Hamid Pirahesh,et al.  DBCache: middle-tier database caching for highly scalable e-business architectures , 2003, SIGMOD '03.

[33]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.