Reinforcement Planning: RL for optimal planners

Search based planners such as A* and Dijkstra's algorithm are proven methods for guiding today's robotic systems. Although such planners are typically based upon a coarse approximation of reality, they are nonetheless valuable due to their ability to reason about the future, and to generalize to previously unseen scenarios. However, encoding the desired behavior of a system into the underlying cost function used by the planner can be a tedious and error-prone task. We introduce Reinforcement Planning, which extends gradient based reinforcement learning algorithms to automatically learn useful surrogate cost functions for optimal planners. Reinforcement Planning presents several advantages over other learning approaches to planning in that it is not limited by the expertise of a human demonstrator, and that it acknowledges the domain of the planner is a simplified model of the world. We demonstrate the effectiveness of our method in learning to solve a noisy physical simulation of the well-known “marble maze” toy.

[1]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[2]  Anthony Stentz,et al.  The Focussed D* Algorithm for Real-Time Replanning , 1995, IJCAI.

[3]  Ken Perlin,et al.  Improving noise , 2002, SIGGRAPH.

[4]  Josef Stoer,et al.  Numerische Mathematik 1 , 1989 .

[5]  Oliver Brock,et al.  High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2009 .

[6]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[7]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[8]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[9]  Joel Chestnutt,et al.  Navigation planning for legged robots , 2007 .

[10]  Christopher G. Atkeson,et al.  An optimization approach to rough terrain locomotion , 2010, 2010 IEEE International Conference on Robotics and Automation.

[11]  Christopher G. Atkeson,et al.  Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[12]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[13]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[14]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[15]  M. Stolle,et al.  Knowledge Transfer Using Local Features , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[16]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[19]  Christopher G. Atkeson,et al.  Finding and transferring policies using stored behaviors , 2010, Auton. Robots.

[20]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.