论文信息 - Online Resource Allocation Using Decompositional Reinforcement Learning

Online Resource Allocation Using Decompositional Reinforcement Learning

This paper considers a novel application domain for reinforcement learning: that of "autonomic computing," i.e. selfmanaging computing systems. RL is applied to an online resource allocation task in a distributed multi-application computing environment with independent time-varying load in each application. The task is to allocate servers in real time so as to maximize the sum of performance-based expected utility in each application. This task may be treated as a composite MDP, and to exploit the problem structure, a simple localized RL approach is proposed, with better scalability than previous approaches. The RL approach is tested in a realistic prototype data center comprising real servers, real HTTP requests, and realistic time-varying demand. This domain poses a number of major challenges associated with live training in a real system, including: the need for rapid training, exploration that avoids excessive penalties, and handling complex, potentially non-Markovian system effects. The early results are encouraging: in overnight training, RL performs as well as or slightly better than heavily researched model-based approaches derived from queuing theory.

Gerald Tesauro | G. Tesauro

[1] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[2] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[3] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[4] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[5] Erol Gelenbe. System Performance Evaluation: Methodologies and Applications , 2000 .

[6] Mark S. Squillante,et al. Internet traffic: periodicity, tail behavior, and performance implications , 2000 .

[7] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.

[8] Jeffrey O. Kephart,et al. The Vision of Autonomic Computing , 2003, Computer.

[9] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[10] Rajarshi Das,et al. Utility functions in autonomic systems , 2004 .

[11] Rajarshi Das,et al. Utility functions in autonomic systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..