Fast Reinforcement Learning Algorithms for Resource Allocation in Data Centers

Dynamic resource allocation to satisfy varying, con-current and unpredictable demands from multiple applications is a key need in cloud systems. A fundamental challenge is the need to find the right balance between over-allocation, which satisfies each application’s varying needs without requiring frequent allocation changes, and system efficiency which requires that the allocation exactly matches the application needs. However, allocating resources close to current needs will result in frequent allocation changes. This can be detrimental to applications since there may be fixed costs (state replication, policy reconfiguration, etc.) that need to be incurred by applications for each allocation change. In this paper, we develop an MDP-based dynamic allocation scheme that uses reinforcement learning to satisfy unpredictable application demands. It minimizes the overall resource allocation needed to satisfy varying application demands while meeting application constraints on the rate of allocation changes. We prove convergence bounds and use real-world traces to study the performance.

[1]  D. White,et al.  Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[2]  Isis Truck,et al.  Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: towards a fully automated workflow , 2011 .

[3]  Prashant J. Shenoy,et al.  Agile dynamic provisioning of multi-tier Internet applications , 2008, TAAS.

[4]  Fang Hao,et al.  ElastiCon; an elastic distributed SDN controller , 2014, 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[5]  Balaji Viswanathan,et al.  SmartScale: Automatic Application Scaling in Enterprise Clouds , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[6]  Moustafa Ghanem,et al.  Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[7]  Prashant J. Shenoy,et al.  Dynamic resource allocation for shared data centers using online measurements , 2003, IWQoS'03.

[8]  Chandra Krintz,et al.  A Pluggable Autoscaling Service for Open Cloud PaaS Systems , 2012, 2012 IEEE Fifth International Conference on Utility and Cloud Computing.

[9]  C. D. Meyer,et al.  Comparison of perturbation bounds for the stationary distribution of a Markov chain , 2001 .

[10]  P. Schweitzer,et al.  Geometric convergence of value-iteration in multichain Markov decision problems , 1979, Advances in Applied Probability.

[11]  Johan Tordsson,et al.  Efficient provisioning of bursty scientific workloads on the cloud using adaptive elasticity control , 2012, ScienceCloud '12.

[12]  Le Yi Wang,et al.  VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[13]  Marin Litoiu,et al.  Optimal autoscaling in a IaaS cloud , 2012, ICAC '12.

[14]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[15]  Kun Wang,et al.  A Distributed Self-Learning Approach for Elastic Provisioning of Virtualized Cloud Resources , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[16]  Fei Li,et al.  Efficient Auto-Scaling Approach in the Telco Cloud Using Self-Learning Algorithm , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[17]  Rajarshi Das,et al.  A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation , 2006, 2006 IEEE International Conference on Autonomic Computing.

[18]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[19]  Suman Nath,et al.  Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services , 2008, NSDI.

[20]  Bu-Sung Lee,et al.  Optimization of Resource Provisioning Cost in Cloud Computing , 2012, IEEE Transactions on Services Computing.

[21]  Enda Barrett,et al.  Applying reinforcement learning towards automating resource allocation and application scalability in the cloud , 2013, Concurr. Comput. Pract. Exp..