OPTIMUSCLOUD: Heterogeneous Configuration Optimization for Distributed Databases in the Cloud

Achieving cost and performance efficiency for cloud-hosted databases requires exploring a large configuration space, including the parameters exposed by the database along with the variety of VM configurations available in the cloud. Even small deviations from an optimal configuration have significant consequences on performance and cost. Existing systems that automate cloud deployment configuration can select nearoptimal instance types for homogeneous clusters of virtual machines and for stateless, recurrent data analytics workloads. We show that to find optimal performance-per-$ cloud deployments for NoSQL database applications, it is important to (1) consider heterogeneous cluster configurations, (2) jointly optimize database and VM configurations, and (3) dynamically adjust configuration as workload behavior changes. We present OPTIMUSCLOUD, an online reconfiguration system that can efficiently perform such joint and heterogeneous configuration for dynamic workloads. We evaluate our system with two clustered NoSQL systems: Cassandra and Redis, using three representative workloads and show that OPTIMUSCLOUD provides 40% higher throughput/$ and 4.5× lower 99-percentile latency on average compared to state-of-the-art prior systems, CherryPick, Selecta, and SOPHIA.

[1]  Sachin Katti,et al.  Cliffhanger: Scaling Performance Cliffs in Web Memory Caches , 2016, NSDI.

[2]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[3]  Kristof Beyls,et al.  Reuse Distance as a Metric for Cache Behavior. , 2001 .

[4]  Christoforos E. Kozyrakis,et al.  Pocket: Elastic Ephemeral Storage for Serverless Analytics , 2018, OSDI.

[5]  Lin Ma,et al.  Query-based Workload Forecasting for Self-Driving Database Management Systems , 2018, SIGMOD Conference.

[6]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[7]  Giuseppe Serazzi,et al.  A Queueing Network Model for Performance Prediction of Apache Cassandra , 2017, VALUETOOLS.

[8]  Michel Gendreau,et al.  Handbook of Metaheuristics , 2010 .

[9]  Irfan Ahmad,et al.  Cache Modeling and Optimization using Miniature Simulations , 2017, USENIX Annual Technical Conference.

[10]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[11]  Saurabh Bagchi,et al.  Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads , 2017, Middleware.

[12]  Nathan Beckmann,et al.  LHD: Improving Cache Hit Rate by Maximizing Hit Density , 2018, NSDI.

[13]  Indranil Gupta,et al.  Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems , 2015, IEEE Transactions on Emerging Topics in Computing.

[14]  Rusty Klophaus,et al.  Riak Core: building distributed applications without shared state , 2010, CUFP '10.

[15]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[16]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  Lalit M. Patnaik,et al.  Genetic algorithms: a survey , 1994, Computer.

[19]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[20]  Paul Wood,et al.  SOPHIA: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads , 2019, USENIX Annual Technical Conference.

[21]  Randy H. Katz,et al.  Selecting the best VM across multiple public clouds: a data-driven performance modeling approach , 2017, SoCC.

[22]  Nicolas Hug,et al.  Surprise: A Python library for recommender systems , 2020, J. Open Source Softw..

[23]  Carlo Curino,et al.  DBSeer: Resource and Performance Prediction for Building a Next Generation Database Cloud , 2013, CIDR.

[24]  Ryan Stutsman,et al.  Memshare: a Dynamic Multi-tenant Key-value Cache , 2017, USENIX Annual Technical Conference.

[25]  Tommaso Cucinotta,et al.  The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks , 2011, J. Syst. Softw..

[26]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[27]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[28]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[29]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[30]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[31]  Dan Suciu,et al.  The Myria Big Data Management and Analytics System and Cloud Services , 2017, CIDR.

[32]  Daniel Neagu,et al.  Interpreting random forest classification models using a feature contribution method , 2013, IRI.

[33]  Saurabh Bagchi,et al.  Dealing with the Unknown: Resilience to Prediction Errors , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[34]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[35]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[36]  Stanley B. Zdonik,et al.  On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems , 2011, Proc. VLDB Endow..

[37]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[38]  Subrata Mitra,et al.  DeepPlace: Learning to Place Applications in Multi-Tenant Clusters , 2019, APSys '19.

[39]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[40]  Ninghui Li,et al.  Federation in genomics pipelines: techniques and challenges , 2019, Briefings Bioinform..

[41]  Jinkyu Koo,et al.  Tiresias: Context-sensitive Approach to Decipher the Presence and Strength of MicroRNA Regulatory Interactions , 2018, Theranostics.

[42]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[43]  Yutao Zhong,et al.  Predicting whole-program locality through reuse distance analysis , 2003, PLDI.

[44]  Christoforos E. Kozyrakis,et al.  Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics , 2018, USENIX Annual Technical Conference.

[45]  Ioannis Konstantinou,et al.  DBalancer: distributed load balancing for NoSQL data-stores , 2013, SIGMOD '13.

[46]  Indranil Gupta,et al.  Parqua: Online Reconfigurations in Virtual Ring-Based NoSQL Systems , 2015, 2015 International Conference on Cloud and Autonomic Computing.

[47]  Saurabh Bagchi,et al.  An ensemble SVM model for the accurate prediction of non-canonical MicroRNA targets , 2015, BCB.

[48]  Margo I. Seltzer,et al.  Using probabilistic reasoning to automate software tuning , 2004, SIGMETRICS '04/Performance '04.

[49]  Jorge Bernardino,et al.  Consistency Models of NoSQL Databases , 2019, Future Internet.

[50]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[51]  Magdalena Balazinska,et al.  Changing the Face of Database Cloud Services with Personalized Service Level Agreements , 2015, CIDR.