Reinforcement learning for imitating constrained reaching movements

The goal of developing algorithms for programming robots by demonstration is to create an easy way of programming robots such that it can be accomplished by anyone. When a demonstrator teaches a task to a robot, he/she shows some ways of fulfilling the task, but not all the possibilities. The robot must then be able to reproduce the task even when unexpected perturbations occur. In this case, it has to learn a new solution. Here, we describe a system to teach the robot constrained reaching tasks. Our system is based on a dynamic system generator modulated by a learned speed trajectory. This system is combined with a reinforcement learning module to allow the robot to adapt the trajectory when facing a new situation, e.g., in the presence of obstacles.

[1]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[2]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[3]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[4]  Gregor Schöner,et al.  Dynamics of behavior: Theory and applications for autonomous robot architectures , 1995, Robotics Auton. Syst..

[5]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[7]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[8]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[9]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[10]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[11]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[12]  K. Dautenhahn,et al.  Imitation in Animals and Artifacts , 2002 .

[13]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[14]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[15]  Dimitri P. Bertsekas,et al.  Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[16]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[17]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[18]  W. Wong,et al.  On ψ-Learning , 2003 .

[19]  Ioannis Iossifidis,et al.  Autonomous reaching and obstacle avoidance with the anthropomorphic arm of a robotic assistant using the attractor dynamics approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[20]  Shohei Kato,et al.  Reinforcement learning for motion control of humanoid robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[21]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[25]  Aude Billard,et al.  Learning Dynamical System Modulation for Constrained Reaching Tasks , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[26]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[27]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[28]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[29]  A. Billard,et al.  A Biologically-Inspired Model of Reaching Movements , 2006 .

[30]  Aude Billard,et al.  Discriminative and adaptive imitation in uni-manual and bi-manual tasks , 2006, Robotics Auton. Syst..

[31]  Ludovic Righetti,et al.  Programmable central pattern generators: an application to biped locomotion control , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[32]  Aude Billard,et al.  Special Issue on Robot Learning by Observation, Demonstration, and Imitation , 2007, IEEE Trans. Syst. Man Cybern. Part B.

[33]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).