Auto-scaling of Web Applications in Clouds: A Tail Latency Evaluation

Mechanisms for dynamically adding and removing Virtual Machines (VMs) to reduce cost while minimizing the latency are called auto-scaling. Latency improvements are mainly fulfilled through minimizing the "average" response times while unpredictabilities and fluctuations of the Web applications, aka flash crowds, can result in very high latencies for users’ requests. Requests influenced by flash crowd suffer from long latencies, known as outliers. Such outliers are inevitable to a large extent as auto-scaling solutions continue to improve the average, not the "tail" of latencies. In this paper, we study possible sources of tail latency in auto-scaling mechanisms for Web applications. Based on our extensive evaluations in a real cloud platform, we discovered sources of a tail latency as 1) large requests, i.e. those data-intensive; 2) long-term scaling intervals; 3) instant analysis of scaling parameters; 4) conservative, i.e. tight, threshold tuning; 5) load-unaware surplus VM selection policies used for executing a scale-down decision; 6) cooldown feature, although cost-effective; and 7) VM start-up delay. We also discovered that after improving the average latency by auto-scaling mechanisms, the tail may behave differently, demanding dedicated tail-aware solutions for auto-scaling mechanisms.

[1]  Adel Nadjaran Toosi,et al.  Auto-scaling web applications in clouds: A cost-aware approach , 2017, J. Netw. Comput. Appl..

[2]  Mohammad Sadegh Aslanpour,et al.  LARPA: A learning automata‐based resource provisioning approach for massively multiplayer online games in cloud environments , 2019, Int. J. Commun. Syst..

[3]  Ming Mao,et al.  A Performance Study on the VM Startup Time in the Cloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[4]  Erik-Jan van Baaren,et al.  WikiBench: A distributed, Wikipedia based web application benchmark , 2009 .

[5]  Torsten Braun,et al.  Simulation of SLA-based VM-scaling algorithms for cloud-distributed applications , 2016, Future Gener. Comput. Syst..

[6]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[7]  Marta Beltrán Automatic provisioning of multi-tier applications in cloud computing environments , 2015, The Journal of Supercomputing.

[8]  Mohammad Sadegh Aslanpour,et al.  SLA-aware resource allocation for application service providers in the cloud , 2016, 2016 Second International Conference on Web Research (ICWR).

[9]  David Sinreich,et al.  An architectural blueprint for autonomic computing , 2006 .

[10]  Xin Yao,et al.  A Survey and Taxonomy of Self-Aware and Self-Adaptive Cloud Autoscaling Systems , 2016, ACM Comput. Surv..

[11]  Adel Nadjaran Toosi,et al.  Performance evaluation metrics for cloud, fog and edge computing: A review, taxonomy, benchmarks and standards for future research , 2020, Internet Things.

[12]  Khaled Salah,et al.  Impact of CPU Utilization Thresholds and Scaling Size on Autoscaling Cloud Resources , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[13]  Emanuel Ferreira Coutinho,et al.  Elasticity in cloud computing: a survey , 2014, annals of telecommunications - annales des télécommunications.

[14]  Edouard Bugnion,et al.  ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks , 2017, SOSP.

[15]  Emiliano Casalicchio,et al.  Mechanisms for SLA provisioning in cloud-based service providers , 2013, Comput. Networks.

[16]  Jinhui Huang,et al.  Resource prediction based on double exponential smoothing in cloud computing , 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet).

[17]  Brian D. Noble,et al.  Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.

[18]  Liming Zhu,et al.  Statistically managing cloud operations for latency-tail-tolerance in IoT-enabled smart cities , 2019, J. Parallel Distributed Comput..

[19]  Rodrigo N. Calheiros,et al.  Auto-scaling Web Applications in Clouds: A Taxonomy and Survey , 2016 .

[20]  H. Vincent Poor,et al.  Latency and Reliability-Aware Task Offloading and Resource Allocation for Mobile Edge Computing , 2017, 2017 IEEE Globecom Workshops (GC Wkshps).

[21]  José Antonio Lozano,et al.  A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments , 2014, Journal of Grid Computing.

[22]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[23]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[24]  Mohammad Sadegh Aslanpour,et al.  Proactive Auto-Scaling Algorithm (PASA) for Cloud Application , 2017, Int. J. Grid High Perform. Comput..