Infra: SLO Aware Elastic Auto-scaling in the Cloud for Cost Reduction

Enterprises often host applications and services on clusters of virtual machine instances provided by cloud service providers, like Amazon, Rackspace, Microsoft, etc. Users pay a cloud usage cost on the basis of the hourly usage [1] of virtual machine instances composing the cluster. A cluster composition refers to the number of virtual machine instances of each type (from a predefined list of types) comprising a cluster. We present Infra, a cloud provisioning framework that can predict an (ϵ, δ)-minimum cluster composition required to run a given application workload on a cloud under an SLO (i.e., Service Level Objective) deadline. This paper does not present a new approximation algorithm, instead we provide a tool that applies existing machine learning techniques to predict an (ϵ, δ)-minimum cluster composition. An (ϵ, δ)-minimum cluster composition specifies a cluster composition whose cost approximates that of the minimum cluster composition (i.e., the cluster composition that incurs the minimum cloud usage cost that must be incurred in executing a given application under an SLO deadline); the approximation bounds the error to a predefined threshold ϵ with a degree of confidence 100 * (1 - δ)%. The degree of confidence 100 * (1 - δ)% specifies that the probability of failure in achieving the error threshold ϵ for the above approximation is at most δ. For ϵ = 0.1 and δ = 0.02, we experimentally demonstrate that an (ϵ, δ)-minimum cluster composition predicted by Infra successfully approximates the minimum cluster composition, i.e., the accuracy of prediction of minimum cluster composition ranges from 93.1% to 97.99% (the error is bound by the error threshold of 0.1) with a 98% degree of confidence, since 100* (1 - δ) = 98%. Auto scaling refers to the process of automatically adding cloud instances to a cluster to adapt to an increase in application workload (increased request rate), and deleting instances from a cluster when there is a decrease in workload (reduced request rate). However, state-of-the-art auto scaling techniques have the following disadvantages: A) they require explicit policy definition for changing the cluster configuration and therefore lack the ability to automatically adapt a cluster with respect to changing workload, B) they do not compute the appropriate size of resources required, and therefore do not result in an “optimal” cluster composition. Infra provides an auto scaler that automatically adapts a cloud infrastructure to changing application workload, scaling the cluster up/down based on predictions from the Infra provisioning tool.

[1]  Minseok Kwon,et al.  Prediction-based virtual instance migration for balanced workload in the cloud datacenters , 2011 .

[2]  Chris Arney Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World , 2014 .

[3]  Carlos A. Varela,et al.  Accurate Resource Prediction for Hybrid IaaS Clouds Using Workload-Tailored Elastic Compute Units , 2013, 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing.

[4]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[5]  Guillaume Pierre,et al.  Resource Provisioning of Web Applications in Heterogeneous Clouds , 2011, WebApps.

[6]  Subhajit Sidhanta,et al.  OptCon: An Adaptable SLA-Aware Consistency Tuning Framework for Quorum-Based Stores , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[7]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[8]  Marlon Dumas,et al.  Towards a Model for Cloud Computing Cost Estimation with Reserved Instances , 2011 .

[9]  Gregory R. Ganger,et al.  Applying Performance Models to Understand Data-Intensive Computing Efficiency , 2010 .

[10]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[11]  Albert Y. Zomaya,et al.  On Modelling and Prediction of Total CPU Usage for Applications in MapReduce Environments , 2012, ICA3PP.

[12]  Ratul Mahajan,et al.  Timecard: controlling user-perceived delays in server-based mobile applications , 2013, SOSP.

[13]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[14]  Peter A. Flach,et al.  Machine Learning - The Art and Science of Algorithms that Make Sense of Data , 2012 .

[15]  Marlon Dumas,et al.  Towards a model for cloud computing cost estimation with reserved resources , 2010 .

[16]  Hari Balakrishnan,et al.  Cicada: Introducing Predictive Guarantees for Cloud Networks , 2014, HotCloud.

[17]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[18]  H. Akaike A new look at the statistical model identification , 1974 .

[19]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[20]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[21]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[24]  David L. Mills,et al.  Network Time Protocol (Version 3) Specification, Implementation , 1992 .