Improvement of Systems Management Policies Using Hybrid Reinforcement Learning

Reinforcement Learning (RL) holds particular promise in an emerging application domain of performance management of computing systems. In recent work, online RL yielded effective server allocation policies in a prototype Data Center, without explicit system models or built-in domain knowledge. This paper presents a substantially improved and more practical “hybrid” approach, in which RL trains offline on data collected while a queuing-theoretic policy controls the system. This approach avoids potentially poor performance in live online training. Additionally we use nonlinear function approximators instead of tabular value functions; this greatly improves scalability, and surprisingly, eliminated the need for exploratory actions. In experiments using both open-loop and closed-loop traffic as well as large switching delays, our results show significant performance improvement over state-of-art queuing model policies.