Reinforcement Learning of Robotic Motion with Genetic Programming, Simulated Annealing and Self-Organizing Map

Reinforcement learning, a sub-area of machine learning, is a method of actively exploring feasible tactics and exploiting already known reward experiences in order to acquire a near-optimal policy. The Q-table of all state-action pairs forms the basis of policy of taking optimal action at each state. But an enormous amount of learning time is required for building the Q-table of considerable size. Moreover, Q-learning can only be applied to problems with discrete state and action spaces. This study proposes a method of genetic programming with simulated annealing to acquire a fairly good program for an agent as a basis for further improvement that adapts to the constraints of an environment. We also propose an implementation of Q-learning to solve problems with continuous state and action spaces using Self-Organizing Map (SOM). An experiment was done by simulating a robotic task with the Player/Stage/Gazebo (PSG) simulator. Experimental results showed the proposed approaches were both effective and efficient.

[1]  C. Touzet,et al.  Self-organizing map for reinforcement learning: obstacle-avoidance with Khepera , 1994, Proceedings of PerAc '94. From Perception to Action.

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Francisco D. Galiana,et al.  Unit commitment by simulated annealing , 1990 .

[4]  Claude F. Touzet,et al.  Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[5]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[7]  Wing-Kwong Wong,et al.  Reinforcement learning for training a computer program of Chinese chess , 2009, Int. J. Intell. Inf. Database Syst..

[8]  Richard T. Vaughan,et al.  The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems , 2003 .

[9]  Hitoshi Iba,et al.  Adaptation technique for integrating genetic programming and reinforcement learning for real robots , 2005, IEEE Transactions on Evolutionary Computation.

[10]  Mats G. Nordahl,et al.  Stereoscopic Vision for a Humanoid Robot Using Genetic Programming , 2000, EvoWorkshops.

[11]  Hitoshi Iba,et al.  Multi-agent Robot Learning by Means of Genetic Programming: Solving an Escape Problem , 2001, ICES.

[12]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[13]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[14]  Graham Kendall,et al.  Diversity in genetic programming: an analysis of measures and correlation with fitness , 2004, IEEE Transactions on Evolutionary Computation.

[15]  D. Adler,et al.  Genetic algorithms and simulated annealing: a marriage proposal , 1993, IEEE International Conference on Neural Networks.

[16]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[17]  N. Daw,et al.  Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .

[18]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[19]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[20]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Naohiro Fukumura,et al.  Learning goal-directed sensory-based navigation of a mobile robot , 1994, Neural Networks.

[23]  Steven M. Gustafson An analysis of diversity in genetic programming , 2004 .

[24]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .