论文信息 - Improvement of Systems Management Policies Using Hybrid Reinforcement Learning

Improvement of Systems Management Policies Using Hybrid Reinforcement Learning

Reinforcement Learning (RL) holds particular promise in an emerging application domain of performance management of computing systems. In recent work, online RL yielded effective server allocation policies in a prototype Data Center, without explicit system models or built-in domain knowledge. This paper presents a substantially improved and more practical “hybrid” approach, in which RL trains offline on data collected while a queuing-theoretic policy controls the system. This approach avoids potentially poor performance in live online training. Additionally we use nonlinear function approximators instead of tabular value functions; this greatly improves scalability, and surprisingly, eliminated the need for exploratory actions. In experiments using both open-loop and closed-loop traffic as well as large switching delays, our results show significant performance improvement over state-of-art queuing model policies.

Rajarshi Das | Gerald Tesauro | Nicholas K. Jong | Mohamed N. Bennani

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[3] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[4] Gerald Tesauro,et al. Online Resource Allocation Using Decompositional Reinforcement Learning , 2005, AAAI.

[5] Rajarshi Das,et al. A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation , 2006, 2006 IEEE International Conference on Autonomic Computing.

[6] David Vengerov,et al. A Reinforcement Learning Framework for Dynamic Resource Allocation: First Results. , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[9] Mark S. Squillante,et al. Internet traffic: periodicity, tail behavior, and performance implications , 2000 .