Online Resource Allocation Using Decompositional Reinforcement Learning

This paper considers a novel application domain for reinforcement learning: that of "autonomic computing," i.e. selfmanaging computing systems. RL is applied to an online resource allocation task in a distributed multi-application computing environment with independent time-varying load in each application. The task is to allocate servers in real time so as to maximize the sum of performance-based expected utility in each application. This task may be treated as a composite MDP, and to exploit the problem structure, a simple localized RL approach is proposed, with better scalability than previous approaches. The RL approach is tested in a realistic prototype data center comprising real servers, real HTTP requests, and realistic time-varying demand. This domain poses a number of major challenges associated with live training in a real system, including: the need for rapid training, exploration that avoids excessive penalties, and handling complex, potentially non-Markovian system effects. The early results are encouraging: in overnight training, RL performs as well as or slightly better than heavily researched model-based approaches derived from queuing theory.