On the use of hybrid reinforcement learning for autonomic resource allocation

Abstract Reinforcement Learning (RL) provides a promising new approach to systems performance management that differs radically from standard queuing-theoretic approaches making use of explicit system performance models. In principle, RL can automatically learn high-quality management policies without an explicit performance model or traffic model, and with little or no built-in system specific knowledge. In our original work (Das, R., Tesauro, G., Walsh, W.E.: IBM Research, Tech. Rep. RC23802 (2005), Tesauro, G.: In: Proc. of AAAI-05, pp. 886–891 (2005), Tesauro, G., Das, R., Walsh, W.E., Kephart, J.O.: In: Proc. of ICAC-05, pp. 342–343 (2005)) we showed the feasibility of using online RL to learn resource valuation estimates (in lookup table form) which can be used to make high-quality server allocation decisions in a multi-application prototype Data Center scenario. The present work shows how to combine the strengths of both RL and queuing models in a hybrid approach, in which RL trains offline on data collected while a queuing model policy controls the system. By training offline we avoid suffering potentially poor performance in live online training. We also now use RL to train nonlinear function approximators (e.g. multi-layer perceptrons) instead of lookup tables; this enables scaling to substantially larger state spaces. Our results now show that, in both open-loop and closed-loop traffic, hybrid RL training can achieve significant performance improvements over a variety of initial model-based policies. We also find that, as expected, RL can deal effectively with both transients and switching delays, which lie outside the scope of traditional steady-state queuing theory.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  C. Watkins Learning from delayed rewards , 1989 .

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[5]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[6]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[7]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Manu Sridharan,et al.  Multi-agent Q-learning and regression trees for automated pricing decisions , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[10]  Mark S. Squillante,et al.  Internet traffic: periodicity, tail behavior, and performance implications , 2000 .

[11]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[12]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[13]  PriceBob,et al.  Accelerating reinforcement learning through implicit imitation , 2003 .

[14]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[15]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[16]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[17]  Prashant J. Shenoy,et al.  Dynamic resource allocation for shared data centers using online measurements , 2003, IWQoS'03.

[18]  Yixin Diao,et al.  Feedback Control of Computing Systems , 2004 .

[19]  Daniel A. Menascé,et al.  Assessing the robustness of self-managing computer systems under highly variable workloads , 2004 .

[20]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[21]  Daniel A. Menascé,et al.  Assessing the robustness of self-managing computer systems under highly variable workloads , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[22]  Virgílio A. F. Almeida,et al.  Performance by Design - Computer Capacity Planning By Example , 2004 .

[23]  Rajarshi Das,et al.  Utility functions in autonomic systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[24]  Shimon Whiteson,et al.  Adaptive job routing and scheduling , 2004, Eng. Appl. Artif. Intell..

[25]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[26]  Daniel A. Menascé,et al.  Resource Allocation for Autonomic Data Centers using Analytic Performance Models , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[27]  Gerald Tesauro,et al.  Online Resource Allocation Using Decompositional Reinforcement Learning , 2005, AAAI.

[28]  David Vengerov,et al.  A Reinforcement Learning Framework for Dynamic Resource Allocation: First Results. , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[29]  David Vengerov Adaptive Utility-Based Scheduling in Resource-Constrained Systems , 2005, Australian Conference on Artificial Intelligence.

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Rajarshi Das,et al.  Model-Based and Model-Free Approaches to Autonomic Resource Allocation , 2005 .

[32]  Rajarshi Das,et al.  Utility-Function-Driven Resource Allocation in Autonomic Systems , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[33]  Asser N. Tantawi,et al.  An analytical model for multi-tier internet services and its applications , 2005, SIGMETRICS '05.

[34]  An observation-based approach towards self-managing web servers , 2006, Comput. Commun..

[35]  Rajarshi Das,et al.  A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation , 2006, 2006 IEEE International Conference on Autonomic Computing.