SOPHIA: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads

Reconfiguring NoSQL databases in the face of changing workload patterns is crucial for maximizing database throughput. However, this is challenging because of the large configuration parameter search space with complex interdependencies among the parameters. While state-of-the-art systems can automatically identify close-to-optimal configurations for static workloads, they suffer for dynamic workloads. This happens due to the three fundamental limitations. First, they do not account for performance degradation during the reconfiguration (say due to database restart, which is often needed to apply the new configuration). Second, they do not account for how transient the new workload pattern will be. Third, they overlook the application’s availability requirements during reconfiguration. Our solution, EUNOMIA, addresses all these shortcomings. The fundamental technical contribution is a cost-benefit analyzer that computes the relative cost and the benefit of each reconfiguration action and determines a reconfiguration plan for a future time window. This specifies when to change and to what configurations. We demonstrate its effectiveness for three different workload traces: a multi-tenant, global-scale metagenomics repository (MG-RAST), a bus-tracking workload (Tiramisu), and an HPC data-analytics job queue, all with varying levels of workload complexity and demonstrating dynamic workload changes. We compare the benefit of EUNOMIA in throughput over the default, a static configuration, and a theoretically ideal solution for two widely popular NoSQL databases—Cassandra and Redis.

[1]  Andreas Wilke,et al.  MG-RAST version 4 - lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis , 2019, Briefings Bioinform..

[2]  Simon Oberthür,et al.  Dynamic online reconfiguration for customizable and self-optimizing operating systems , 2005, EMSOFT.

[3]  Dilma Da Silva,et al.  System Support for Online Reconfiguration , 2003, USENIX Annual Technical Conference, General Track.

[4]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[5]  Jeremy Leipzig,et al.  A review of bioinformatic pipeline frameworks , 2016, Briefings Bioinform..

[6]  Bowen Zhou,et al.  Mitigating interference in cloud services by middleware reconfiguration , 2014, Middleware.

[7]  Haibo Chen,et al.  Replication-driven Live Reconfiguration for Fast Distributed Transaction Processing , 2017, USENIX Annual Technical Conference.

[8]  Josiah L. Carlson,et al.  Redis in Action , 2013 .

[9]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[10]  Liuba Shrira,et al.  Modular Software Upgrades for Distributed Systems , 2006, ECOOP.

[11]  Anthony K. H. Tung,et al.  A new approach to dynamic self-tuning of database buffers , 2008, TOS.

[12]  Chunjie Luo,et al.  Characterizing data analysis workloads in data centers , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[13]  Shu Wang,et al.  Understanding and Auto-Adjusting Performance-Sensitive Configurations , 2018, ASPLOS.

[14]  Lin Ma,et al.  Query-based Workload Forecasting for Self-Driving Database Management Systems , 2018, SIGMOD Conference.

[15]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[16]  John Bent,et al.  MDHIM: A Parallel Key/Value Framework for HPC , 2015, HotStorage.

[17]  Ananth Grama,et al.  EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm , 2016, Scientific Reports.

[18]  Tommaso Cucinotta,et al.  The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks , 2011, J. Syst. Softw..

[19]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  Jinkyu Koo,et al.  Tiresias: Context-sensitive Approach to Decipher the Presence and Strength of MicroRNA Regulatory Interactions , 2018, Theranostics.

[21]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[22]  Indranil Gupta,et al.  Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems , 2015, IEEE Transactions on Emerging Topics in Computing.

[23]  Ninghui Li,et al.  Federation in genomics pipelines: techniques and challenges , 2019, Briefings Bioinform..

[24]  Li Zhang,et al.  MRONLINE: MapReduce online performance tuning , 2014, HPDC '14.

[25]  Robert Ricci,et al.  Rocksteady: Fast Migration for Low-latency In-memory Storage , 2017, SOSP.

[26]  Robert L. Henderson,et al.  Job Scheduling Under the Portable Batch System , 1995, JSSPP.

[27]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[28]  Yuqing Zhu,et al.  BestConfig: tapping the performance potential of systems via automatic configuration tuning , 2017, SoCC.

[29]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[30]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[31]  Stanley B. Zdonik,et al.  On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems , 2011, Proc. VLDB Endow..

[32]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[33]  Mohamed F. Mokbel,et al.  SARD: A statistical approach for ranking database tuning parameters , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[34]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[35]  Saurabh Bagchi,et al.  SARVAVID: A Domain Specific Language for Developing Scalable Computational Genomics Applications , 2016, ICS.

[36]  Saurabh Bagchi,et al.  Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads , 2017, Middleware.

[37]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[38]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[39]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[40]  Rui Zhang,et al.  Finding the Big Data Sweet Spot: Towards Automatically Recommending Configurations for Hadoop Clusters on Docker Containers , 2015, 2015 IEEE International Conference on Cloud Engineering.

[41]  Daniel C. Zilio,et al.  Physical database design decision algorithms and concurrent reorganization for parallel database systems , 1998 .

[42]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[43]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[44]  Christos Faloutsos,et al.  Forecasting Big Time Series: Old and New , 2018, Proc. VLDB Endow..

[45]  Margo I. Seltzer,et al.  Using probabilistic reasoning to automate software tuning , 2004, SIGMETRICS '04/Performance '04.

[46]  Ananth Grama,et al.  Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions , 2016, BMC Systems Biology.

[47]  Scott Nettles,et al.  Dynamic software updating , 2001, PLDI '01.

[48]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[49]  Prashant J. Shenoy,et al.  ShuttleDB: Database-Aware Elasticity in the Cloud , 2014, ICAC.

[50]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[51]  Carlo Curino,et al.  Performance and resource modeling in highly-concurrent OLTP workloads , 2013, SIGMOD '13.

[52]  Saurabh Bagchi,et al.  ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services , 2015, 2015 IEEE International Conference on Autonomic Computing.

[53]  Ion Stoica,et al.  BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores , 2016, NSDI.

[54]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[55]  Douglas B. Terry,et al.  A Self-Configurable Geo-Replicated Cloud Storage System , 2014, OSDI.

[56]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.