Adaptive AI-based auto-scaling for Kubernetes

Kubernetes, the prevalent container orchestrator for cloud-deployed web applications, offers an automatic scaling feature for the application provider in order to meet the ever-changing amount of demand from its clients. This auto-scaling service, however, requires a seemingly difficult parameter set to be customized by the application provider, and those management parameters are static while incoming web request dynamics often change, not to mention the fact that scaling decisions are inherently reactive, instead of being proactive. Therefore we set the ultimate goal of making cloud-based web applications’ management easier and more effective.We propose a Kubernetes scaling engine that makes the auto-scaling decisions apt for handling the actual variability of incoming requests. In this engine various AI-based forecast methods compete with each other via a short-term evaluation loop in order to always give the lead to the method that suits best the actual request dynamics, as soon as possible. We also introduce a compact management parameter for the cloud-tenant application provider in order to easily set their sweet spot in the resource over-provisioning vs. SLA violation trade-off.The multi-forecast scaling engine and the proposed management parameter are evaluated both in simulations and with measurements on our collected web traces to show the improved quality of fitting provisioned resources to service demand. We find that with just a few competing forecast methods, our auto-scaling engine, implemented in Kubernetes, results in significantly less lost requests with slightly more provisioned resources compared to the default baseline.

[1]  Claus Pahl,et al.  A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[2]  Z. Zhang,et al.  MMPP/M/C queue with congestion-based staffing policy and applications in operations of steel industry , 2018, Journal of Iron and Steel Research International.

[3]  Kevin Lee,et al.  Empirical prediction models for adaptive resource provisioning in the cloud , 2012, Future Gener. Comput. Syst..

[4]  Le Yi Wang,et al.  VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[5]  Devesh Tiwari,et al.  Exploring Potential for Non-Disruptive Vertical Auto Scaling and Resource Estimation in Kubernetes , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[6]  Hamzeh Khazaei,et al.  Elascale: autoscaling and monitoring as a service , 2017, CASCON.

[7]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[8]  José Antonio Lozano,et al.  A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments , 2014, Journal of Grid Computing.

[9]  Isis Truck,et al.  Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: towards a fully automated workflow , 2011 .

[10]  Marcos José Santana,et al.  Combining time series prediction models using genetic algorithm to autoscaling Web applications hosted in the cloud infrastructure , 2015, Neural Computing and Applications.

[11]  Suman Nath,et al.  Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services , 2008, NSDI.

[12]  V. Mendes,et al.  Short-term electricity prices forecasting in a competitive market: A neural network approach , 2007 .

[13]  Philippe Merle,et al.  Elasticity in Cloud Computing: State of the Art and Research Challenges , 2018, IEEE Transactions on Services Computing.

[14]  Haiyun Luo,et al.  Energy-optimal mobile application execution: Taming resource-poor mobile devices with cloud clones , 2012, 2012 Proceedings IEEE INFOCOM.

[15]  Enda Barrett,et al.  Applying reinforcement learning towards automating resource allocation and application scalability in the cloud , 2013, Concurr. Comput. Pract. Exp..

[16]  Shay Horovitz,et al.  Efficient Cloud Auto-Scaling with SLA Objective Using Q-Learning , 2018, 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud).

[17]  Jeremy N. V. Miles,et al.  R Squared, Adjusted R Squared† , 2005 .

[18]  Philippe Merle,et al.  Autonomic Vertical Elasticity of Docker Containers with ELASTICDOCKER , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[19]  Mak A. Kaboudan A dynamic-server queuing simulation , 1998, Comput. Oper. Res..

[20]  Shuhui Li,et al.  Using neural networks to estimate wind turbine power generation , 2001 .

[21]  Valeria Cardellini,et al.  Horizontal and Vertical Scaling of Container-Based Applications Using Reinforcement Learning , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[22]  Andrei Gurtov,et al.  Queueing System with On-Demand Number of Servers , 2012 .