Mowgli: Finding Your Way in the DBMS Jungle

Big Data and IoT applications require highly-scalable database management system (DBMS), preferably operated in the cloud to ensure scalability also on the resource level. As the number of existing distributed DBMS is extensive, the selection and operation of a distributed DBMS in the cloud is a challenging task. While DBMS benchmarking is a supportive approach, existing frameworks do not cope with the runtime constraints of distributed DBMS and the volatility of cloud environments. Hence, DBMS evaluation frameworks need to consider DBMS runtime and cloud resource constraints to enable portable and reproducible results. In this paper we present Mowgli, a novel evaluation framework that enables the evaluation of non-functional DBMS features in correlation with DBMS runtime and cloud resource constraints. Mowgli fully automates the execution of cloud and DBMS agnostic evaluation scenarios, including DBMS cluster adaptations. The evaluation of Mowgli is based on two IoT-driven scenarios, comprising the DBMSs Apache Cassandra and Couchbase, nine DBMS runtime configurations, two cloud providers with two different storage backends. Mowgli automates the execution of the resulting 102 evaluation scenarios, verifying its support for portable and reproducible DBMS evaluations. The results provide extensive insights into the DBMS scalability and the impact of different cloud resources. The significance of the results is validated by the correlation with existing DBMS evaluation results.

[1]  Wouter Joosen,et al.  On the State of NoSQL Benchmarks , 2017, ICPE Companion.

[2]  Jim Gray Database and transaction processing benchmarks , 1992, SIGMOD '92.

[3]  Shahram Ghandeharizadeh,et al.  BG: A Benchmark to Evaluate Interactive Social Networking Actions , 2013, CIDR.

[4]  Robert J. Meijer,et al.  Sensor Data Storage Performance: SQL or NoSQL, Physical or Virtual , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[5]  Neil A. Ernst,et al.  Performance Evaluation of NoSQL Databases: A Case Study , 2015, PABS@ICPE.

[6]  Peter Van Roy,et al.  Measuring Elasticity for Cloud Databases , 2011, CLOUD 2011.

[7]  Jörg Domaschka,et al.  Is Distributed Database Evaluation Cloud-Ready? , 2017, ADBIS.

[8]  Abdullah Talha Kabakus,et al.  A performance evaluation of in-memory databases , 2017, J. King Saud Univ. Comput. Inf. Sci..

[9]  Miriam A. M. Capretz,et al.  Data management in cloud environments: NoSQL and NewSQL data stores , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[10]  Jeffrey M. Galloway,et al.  Performance of Virtual Machines Using Diskfull and Diskless Compute Nodes , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[11]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[12]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[13]  Norbert Ritter,et al.  NoSQL database systems: a survey and decision guidance , 2017, Computer Science - Research and Development.

[14]  Yushun Fan,et al.  Performance Comparison between Five NoSQL Databases , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[15]  Christopher B. Hauser,et al.  Gibbon: An Availability Evaluation Framework for Distributed Databases , 2017, OTM Conferences.

[16]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[17]  Pietro Piazzolla,et al.  Performance Evaluation of NoSQL Databases , 2014, EPEW.

[18]  Hannes Mühleisen,et al.  Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing , 2018, DBTest@SIGMOD.

[19]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[20]  Sherif Sakr,et al.  Towards an Extensible Middleware for Database Benchmarking , 2014, TPCTC.

[21]  Jorge Bernardino,et al.  Which NoSQL Database? A Performance Overview , 2014, Open J. Databases.

[22]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[23]  Daniel Seybold Towards a framework for orchestrated distributed database evaluation in the cloud , 2017, Middleware 2017.

[24]  Jörg Domaschka,et al.  Is elasticity of scalable databases a Myth? , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[25]  Jennifer Widom,et al.  The Beckman Report on Database Research , 2014, SGMD.

[26]  Liu Chen,et al.  A Survey on NoSQL Stores , 2018, ACM Comput. Surv..

[27]  Christopher B. Hauser,et al.  Reliability and Availability Properties of Distributed Database Systems , 2014, 2014 IEEE 18th International Enterprise Distributed Object Computing Conference.

[28]  David Bermbach,et al.  A Runtime Quality Measurement Framework for Cloud Database Service Systems , 2012, 2012 Eighth International Conference on the Quality of Information and Communications Technology.

[29]  João Paulo,et al.  HTAPBench: Hybrid Transactional and Analytical Processing Benchmark , 2017, ICPE.

[30]  Jörg Domaschka,et al.  A Provider-Agnostic Approach to Multi-cloud Orchestration Using a Constraint Language , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[31]  Lin Xiao,et al.  YCSB++: benchmarking and performance debugging advanced features in scalable table stores , 2011, SoCC.

[32]  David Bermbach,et al.  BenchFoundry: A Benchmarking Framework for Cloud Storage Services , 2017, ICSOC.

[33]  Tilmann Rabl,et al.  Solving Big Data Challenges for Enterprise Application Performance Management , 2012, Proc. VLDB Endow..

[34]  Jörn Kuhlenkamp,et al.  Benchmarking Scalability and Elasticity of Distributed Database Systems , 2014, Proc. VLDB Endow..

[35]  Alan Fekete,et al.  YCSB+T: Benchmarking web-scale transactional databases , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[36]  Christopher B. Hauser,et al.  Cloud Orchestration Features: Are Tools Fit for Purpose? , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[37]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[38]  Sherif Sakr,et al.  Cloud-hosted databases: technologies, challenges and opportunities , 2014, Cluster Computing.

[39]  Bryan Ng,et al.  An Automated Tool Profiling Service for the Cloud , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).