论文信息 - On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming

On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming

In recent years there has been a great deal of interest in "modular reinforcement learning" (MRL). Typically, problems are decomposed into concurrent subgoals, allowing increased scalability and state abstraction. An arbitrator combines the subagents' preferences to select an action. In this work, we contrast treating an MRL agent as a set of subagents with the same goal with treating an MRL agent as a set of subagents who may have different, possibly conflicting goals. We argue that the latter is a more realistic description of real-world problems, especially when building partial programs. We address a range of algorithms for single-goal MRL, and leveraging social choice theory, we present an impossibility result for applications of such algorithms to multigoal MRL. We suggest an alternative formulation of arbitration as scheduling that avoids the assumptions of comparability of preference that are implicit in single-goal MRL. A notable feature of this formulation is the explicit codification of the tradeoffs between the subproblems. Finally, we introduce A2BL, a language that encapsulates many of these ideas.

Michael Mateas | Charles Lee Isbell | Sooraj Bhat

[1] K. Arrow,et al. Social Choice and Individual Values , 1951 .

[2] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[3] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.

[4] Jonas Karlsson,et al. Learning to Solve Multiple Goals , 1997 .

[5] Maja J. Matarić,et al. Action Selection methods using Reinforcement Learning , 1996 .

[6] Kevin Roberts,et al. Interpersonal Comparability and Social Choice Theory , 1980 .

[7] Andrew Stern,et al. A Behavior Language for Story-Based Believable Agents , 2002, IEEE Intell. Syst..

[8] A. B. Loyall,et al. Integrating Reactivity, Goals, and Emotion in a Broad Agent , 1992 .

[9] Jon Doyle,et al. Impediments to Universal Preference-Based Default Theories , 1989, KR.

[10] Dana H. Ballard,et al. Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.

[11] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[12] P. Reny. Arrow’s theorem and the Gibbard-Satterthwaite theorem: a unified approach , 2001 .