Sensorimotor abstraction selection for efficient, autonomous robot skill acquisition

To achieve truly autonomous robot skill acquisition, a robot can use neither a single large general state space (because learning is not feasible), nor a small problem-specific state space (because it is not general).We propose that instead a robot should have a set of sensorimotor abstractions that can be considered small candidate state spaces, and select one that is appropriate for learning a skill when it decides to do so. We introduce an incremental algorithm that selects a state space in which to learn a skill from among a set of potential spaces given a successful sample trajectory. The algorithm returns a policy fitting that trajectory in the new state space so that learning does not have to begin from scratch. We demonstrate that the algorithm selects an appropriate space for a sequence of demonstration skills on a physically realistic simulated mobile robot, and that the resulting initial policies closely match the sample trajectory.

[1]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Roderic A. Grupen,et al.  Learning to Coordinate Controllers - Reinforcement Learning on a Control Basis , 1997, IJCAI.

[4]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[5]  Bram Bakker,et al.  Reinforcement Learning with Multiple, Qualitatively Different State Representations , 2007 .

[6]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[7]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[8]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[10]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.