Resource Usage Prediction in Distributed Key-Value Datastores

In order to attain the promises of the Cloud Computing paradigm, systems need to be able to transparently adapt to environment changes. Such behavior benefits from the ability to predict those changes in order to handle them seamlessly. In this paper, we present a mechanism to accurately predict the resource usage of distributed key-value datastores. Our mechanism requires offline training but, in contrast with other approaches, it is sufficient to run it only once per hardware configuration and subsequently use it for online prediction of database performance under any circumstance. The mechanism accurately estimates the database resource usage for any request distribution with an average accuracy of 94i?ź%, only by knowing two parameters: i cache hit ratio; and ii incoming throughput. Both input values can be observed in real time or synthesized for request allocation decisions. This novel approach is sufficiently simple and generic, while simultaneously being suitable for other practical applications.

[1]  Carlo Curino,et al.  Performance and resource modeling in highly-concurrent OLTP workloads , 2013, SIGMOD '13.

[2]  Surajit Chaudhuri,et al.  Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques , 2012, Proc. VLDB Endow..

[3]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[4]  Xifeng Yan,et al.  Workload characterization and prediction in the cloud: A multiple time series approach , 2012, 2012 IEEE Network Operations and Management Symposium.

[5]  Purushottam Kulkarni,et al.  Affinity-Aware Modeling of CPU Usage for Provisioning Virtualized Applications , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[6]  Carlo Curino,et al.  Workload-aware database monitoring and consolidation , 2011, SIGMOD '11.

[7]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[8]  Paolo Romano,et al.  Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning , 2015, ICPE.

[9]  Ioannis Konstantinou,et al.  TIRAMOLA: elastic nosql provisioning through a cloud management platform , 2012, SIGMOD Conference.

[10]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[11]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[12]  Prashant J. Shenoy,et al.  Profiling and Modeling Resource Usage of Virtualized Applications , 2008, Middleware.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Ivan T. Bowman,et al.  Predicting system performance for multi-tenant database workloads , 2011, DBTest '11.

[15]  João Paulo,et al.  MeT: workload aware elasticity for NoSQL , 2013, EuroSys '13.

[16]  Prashant J. Shenoy,et al.  Autonomic mix-aware provisioning for non-stationary data center workloads , 2010, ICAC '10.

[17]  Thomas Roberts Puzak,et al.  Analysis of cache replacement-algorithms , 1985 .

[18]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[19]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[20]  Qi Zhang,et al.  A Regression-Based Analytic Model for Dynamic Resource Provisioning of Multi-Tier Applications , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[21]  Bettina Kemme,et al.  Compaction Management in Distributed Key-Value Datastores , 2015, Proc. VLDB Endow..

[22]  Hao Che,et al.  Hierarchical Web caching systems: modeling, design and experimental results , 2002, IEEE J. Sel. Areas Commun..

[23]  Rolf Stadler,et al.  Resource Management in Clouds: Survey and Research Challenges , 2015, Journal of Network and Systems Management.

[24]  Carlo Curino,et al.  DBSeer: Resource and Performance Prediction for Building a Next Generation Database Cloud , 2013, CIDR.

[25]  José A. B. Fortes,et al.  On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[26]  Peter Desnoyers,et al.  Modellus: Automated modeling of complex internet data center applications , 2012, TWEB.

[27]  Lars George,et al.  HBase: The Definitive Guide , 2011 .