Releasing Cloud Databases for the Chains of Performance Prediction Models

The onset of cloud computing has brought about computing power that can be provisioned and released on-demand. This capability has drastically increased the complexity of workload and resource management for database applications. Existing solutions rely on query latency prediction models, which are notoriously inaccurate in cloud environments. We argue for a substantial shift away from query performance prediction models and towards machine learning techniques that directly model the monetary cost of using cloud resources and processing query workloads on them. Towards this end, we sketch the design of a learningbased service for IaaS-deployed data management applications that uses reinforcement learning to learn, over time, low-cost policies for provisioning virtual machines and dispatching queries across them. Our service can effectively handle dynamic workloads and changes in resource availability, leading to applications that are continuously adaptable, cost effective, and performance aware. In this paper, we discuss several challenges involved in building such a service, and we present results from a proof-of-concept implementation of our approach.

[1]  Olga Papaemmanouil,et al.  WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases , 2016, Proc. VLDB Endow..

[2]  Jignesh M. Patel,et al.  Towards Multi-Tenant Performance SLOs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  Julien Gossa,et al.  Cost-Wait Trade-Offs in Client-Side Resource Provisioning with Elastic Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[4]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[5]  Magdalena Balazinska,et al.  PerfEnforce Demonstration: Data Analytics with Performance Guarantees , 2016, SIGMOD Conference.

[6]  Shie Mannor,et al.  Thompson Sampling for Complex Online Problems , 2013, ICML.

[7]  Eli Upfal,et al.  Performance prediction for concurrent database workloads , 2011, SIGMOD '11.

[8]  Tim Brecht,et al.  Q-Cop: Avoiding bad query mixes to minimize client timeouts under heavy loads , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[10]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[11]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[12]  Jennie Duggan,et al.  A generic auto-provisioning framework for cloud databases , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[13]  Calton Pu,et al.  Intelligent management of virtualized resources for database systems in cloud environment , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[14]  Kyong Hoon Kim,et al.  Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[15]  Eli Upfal,et al.  Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction , 2014, EDBT.

[16]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[17]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[18]  Yun Chi,et al.  PMAX: tenant placement in multitenant databases for profit maximization , 2013, EDBT '13.

[19]  Omar Besbes,et al.  Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[20]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[21]  Andrea Bonarini,et al.  Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.

[22]  Yun Chi,et al.  SLA-tree: a framework for efficiently supporting SLA-based decisions in cloud computing , 2011, EDBT/ICDT '11.

[23]  Prashant J. Shenoy,et al.  Empirical evaluation of latency-sensitive application performance in the cloud , 2010, MMSys '10.

[24]  Nikhil R. Devanur,et al.  Cloud scheduling with setup cost , 2013, SPAA.

[25]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[26]  Yun Chi,et al.  iCBS: Incremental Costbased Scheduling under Piecewise Linear SLAs , 2011, Proc. VLDB Endow..

[27]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[28]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[29]  Ashok K. Agrawala,et al.  Thompson Sampling for Dynamic Multi-armed Bandits , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[30]  Calton Pu,et al.  ActiveSLA: a profit-oriented admission control framework for database-as-a-service providers , 2011, SoCC.

[31]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[32]  Badrish Chandramouli,et al.  A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service , 2013, SIGMOD '13.

[33]  Bora Uçar,et al.  Integrated data placement and task assignment for scientific workflows in clouds , 2011, DIDC '11.

[34]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[35]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[36]  Yun Chi,et al.  CloudOptimizer: multi-tenancy for I/O-bound OLAP workloads , 2013, EDBT '13.

[37]  Divyakant Agrawal,et al.  Characterizing tenant behavior for placement and crisis mitigation in multitenant DBMSs , 2013, SIGMOD '13.

[38]  Magdalena Balazinska,et al.  Changing the Face of Database Cloud Services with Personalized Service Level Agreements , 2015, CIDR.

[39]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[40]  Benjamin Van Roy,et al.  An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..

[41]  Carlo Curino,et al.  Workload-aware database monitoring and consolidation , 2011, SIGMOD '11.

[42]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[43]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[44]  Shipra Agrawal,et al.  Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[45]  Steven S. Seiden,et al.  On the online bin packing problem , 2001, JACM.