Learning mixed behaviours with parallel Q-learning

This paper presents a reinforcement learning algorithm based on a parallel approach of the Watkins's Q-learning. This algorithm is used to control a two axis micro-manipulator system. The aim is to learn complex behaviour such as reaching target positions and avoiding obstacles at the same time. The simulations and the tests with the real manipulator show that this algorithm is able to learn simultaneously opposite behaviours and that it generates interesting action policies with regard to global path optimization.

[1]  Michaël Gauthier,et al.  Behavior of a magnetic manipulator of biological objects , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[2]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[3]  L.-J. Lin,et al.  Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Guillaume J. Laurent,et al.  Parallel Q-learning for a block-pushing problem , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[6]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[7]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[8]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.