CORL: A Continuous-state Offset-dynamics Reinforcement Learner

Continuous state spaces and stochastic, switching dynamics characterize a number of rich, real-world domains, such as robot navigation across varying terrain. We describe a reinforcement-learning algorithm for learning in these domains and prove for certain environments the algorithm is probably approximately correct with a sample complexity that scales polynomially with the state-space dimension. Unfortunately, no optimal planning techniques exist in general for such problems; instead we use fitted value iteration to solve the learned MDP, and include the error due to approximate planning in our bounds. Finally, we report an experiment using a robotic car driving over varying terrain to demonstrate that these dynamics representations adequately capture real-world dynamics and that our algorithm can be used to efficiently solve such problems.

[1]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[2]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  Lihong Li,et al.  Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.

[5]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[6]  J. Tsitsiklis,et al.  An optimal multigrid algorithm for continuous state discrete time stochastic control , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[7]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[8]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[9]  Michael L. Littman,et al.  Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[10]  Bethany R. Leffler,et al.  Efficient Learning of Dynamics Models using Terrain Classification , 2008 .

[11]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[12]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[13]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..