Efficient Reinforcement Learning with Relocatable Action Models

Realistic domains for learning possess regularities that make it possible to generalize experience across related states. This paper explores an environment-modeling framework that represents transitions as state-independent outcomes that are common to all states that share the same type. We analyze a set of novel learning problems that arise in this framework, providing lower and upper bounds. We single out one particular variant of practical interest and provide an efficient algorithm and experimental results in both simulated and robotic environments.

[1]  Reid G. Simmons,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[2]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[7]  Dale Schuurmans,et al.  Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.

[8]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[9]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[10]  John Langford,et al.  Exploration in Metric State Spaces , 2003, ICML.

[11]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[12]  Thomas J. Walsh,et al.  Efficient Exploration With Latent Structure , 2005, Robotics: Science and Systems.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[16]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.