Cloud elasticity using probabilistic model checking

Cloud computing has become the leading paradigm for deploying large-scale infrastructures and running big data applications, due to its capacity of achieving economies of scale. In this work, we focus on one of the most prominent advantages of cloud computing, namely the on-demand resource provisioning, which is commonly referred to as elasticity. Although a lot of effort has been invested in developing systems and mechanisms that enable elasticity, the elasticity decision policies tend to be designed without guaranteeing or quantifying the quality of their operation. This work aims to make the development of elasticity policies more formalized and dependable. We make two distinct contributions. First, we propose an extensible approach to enforcing elasticity through the dynamic instantiation and online quantitative verification of Markov Decision Processes (MDP) using probabilistic model checking. Second, we propose concrete elasticity models and related elasticity policies. We evaluate our decision policies using both real and synthetic datasets in clusters of NoSQL databases. According to the experimental results, our approach improves upon the state-of-the-art in significantly increasing user-defined utility values and decreasing user-defined threshold violations.

[1]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[2]  Paul Marshall,et al.  Elastic Site: Using Clouds to Elastically Extend Site Resources , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[3]  Radu Calinescu,et al.  Dynamic QoS Management and Optimization in Service-Based Systems , 2011, IEEE Transactions on Software Engineering.

[4]  Lakshmi N. Bairavasundaram,et al.  Responding rapidly to service level violations using virtual appliances , 2012, OPSR.

[5]  Bruce M. Maggs,et al.  Cutting the electric bill for internet-scale systems , 2009, SIGCOMM '09.

[6]  ProdanRadu,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011 .

[7]  Nikos Tsikoudis,et al.  Adapting data-intensive workloads to generic allocation policies in cloud infrastructures , 2012, 2012 IEEE Network Operations and Management Symposium.

[8]  Carlo Ghezzi,et al.  Self-adaptive software needs quantitative verification at runtime , 2012, CACM.

[9]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[10]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[11]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[12]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[13]  Marta Z. Kwiatkowska,et al.  PRISM: probabilistic model checking for performance and reliability analysis , 2009, PERV.

[14]  Moustafa Ghanem,et al.  Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[15]  Radu Calinescu,et al.  log2cloud: log-based prediction of cost-performance trade-offs for cloud deployments , 2013, SAC '13.

[16]  Vanish Talwar,et al.  A flexible architecture integrating monitoring and analytics for managing large-scale data centers , 2011, ICAC '11.

[17]  Carlo Ghezzi,et al.  Managing non-functional uncertainty via model-driven adaptivity , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[18]  Samuel Kounev,et al.  Elasticity in Cloud Computing: What It Is, and What It Is Not , 2013, ICAC.

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Michael I. Jordan,et al.  The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements , 2011, FAST.

[21]  Nectarios Koziris,et al.  ~okeanos: Building a Cloud, Cluster by Cluster , 2013, IEEE Internet Computing.

[22]  Marta Z. Kwiatkowska,et al.  Automated Verification Techniques for Probabilistic Systems , 2011, SFM.

[23]  Bu-Sung Lee,et al.  Optimization of Resource Provisioning Cost in Cloud Computing , 2012, IEEE Transactions on Services Computing.

[24]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[25]  Radu Calinescu,et al.  An incremental verification framework for component-based software systems , 2013, CBSE '13.

[26]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[27]  Mor Harchol-Balter,et al.  AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers , 2012, TOCS.

[28]  Suman Nath,et al.  Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services , 2008, NSDI.

[29]  Ioannis Konstantinou,et al.  Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[30]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[31]  Petter Svärd,et al.  Self-management Challenges for Multi-cloud Architectures (Invited Paper) , 2011 .