Improving the performance of complex agent plans through reinforcement learning

Agent programming in complex, partially observable and stochastic domains usually requires a great deal of understanding of both the domain and the task, in order to provide the agent with the knowledge necessary to act effectively. While symbolic methods allow the designer to specify declarative knowledge about the domain, the resulting plan can be brittle since it is difficult to supply a symbolic model that is accurate enough to foresee all possible events in complex environments, especially in the case of partial observability. Reinforcement Learning (RL) techniques, on the other hand, can learn a policy and make use of a learned model, but it is difficult to reduce and shape the scope of the learning algorithm by exploiting a priori information. We propose a methodology for writing complex agent programs that can be effectively improved through experience. We show how to derive a stochastic process from a partial specification of the plan, so that the latter's perfomance can be improved solving a RL problem much smaller than classical RL formulations. Finally, we demonstrate our approach in the context of Keepaway Soccer, a common RL benchmark based on a RoboCup Soccer 2D simulator.

[1]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[2]  Blai Bonet,et al.  Planning and Control in Artificial Intelligence: A Unifying Perspective , 2001, Applied Intelligence.

[3]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4]  Peter Stone,et al.  The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[5]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[6]  Theodore J. Perkins,et al.  On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.

[7]  Craig Boutilier,et al.  Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.

[8]  Ola Pettersson,et al.  Execution monitoring in robotics: A survey , 2005, Robotics Auton. Syst..

[9]  Peter Stone,et al.  Learning Complementary Multiagent Behaviors: A Case Study , 2009, RoboCup.

[10]  Sheila A. McIlraith,et al.  Decision-Theoretic GOLOG with Qualitative Preferences , 2006, KR.

[11]  Theodore J. Perkins,et al.  Reinforcement learning for POMDPs based on action values and stochastic optimization , 2002, AAAI/IAAI.

[12]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[13]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[14]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[15]  Mark D. Pendrith,et al.  An Analysis of Direct Reinforcement Learning in Non-Markovian Domains , 1998, ICML.

[16]  David Harel,et al.  Statecharts: A Visual Formalism for Complex Systems , 1987, Sci. Comput. Program..

[17]  Bhaskara Marthi,et al.  Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.