论文信息 - Reinforcement Learning of Robotic Motion with Genetic Programming, Simulated Annealing and Self-Organizing Map

Reinforcement Learning of Robotic Motion with Genetic Programming, Simulated Annealing and Self-Organizing Map

Reinforcement learning, a sub-area of machine learning, is a method of actively exploring feasible tactics and exploiting already known reward experiences in order to acquire a near-optimal policy. The Q-table of all state-action pairs forms the basis of policy of taking optimal action at each state. But an enormous amount of learning time is required for building the Q-table of considerable size. Moreover, Q-learning can only be applied to problems with discrete state and action spaces. This study proposes a method of genetic programming with simulated annealing to acquire a fairly good program for an agent as a basis for further improvement that adapts to the constraints of an environment. We also propose an implementation of Q-learning to solve problems with continuous state and action spaces using Self-Organizing Map (SOM). An experiment was done by simulating a robotic task with the Player/Stage/Gazebo (PSG) simulator. Experimental results showed the proposed approaches were both effective and efficient.

[1] C. Touzet,et al. Self-organizing map for reinforcement learning: obstacle-avoidance with Khepera , 1994, Proceedings of PerAc '94. From Perception to Action.

[2] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[3] Francisco D. Galiana,et al. Unit commitment by simulated annealing , 1990 .

[4] Claude F. Touzet,et al. Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[5] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[7] Wing-Kwong Wong,et al. Reinforcement learning for training a computer program of Chinese chess , 2009, Int. J. Intell. Inf. Database Syst..

[8] Richard T. Vaughan,et al. The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems , 2003 .

[9] Hitoshi Iba,et al. Adaptation technique for integrating genetic programming and reinforcement learning for real robots , 2005, IEEE Transactions on Evolutionary Computation.

[10] Mats G. Nordahl,et al. Stereoscopic Vision for a Humanoid Robot Using Genetic Programming , 2000, EvoWorkshops.

[11] Hitoshi Iba,et al. Multi-agent Robot Learning by Means of Genetic Programming: Solving an Escape Problem , 2001, ICES.

[12] S. Dreyfus,et al. Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[13] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[14] Graham Kendall,et al. Diversity in genetic programming: an analysis of measures and correlation with fitness , 2004, IEEE Transactions on Evolutionary Computation.

[15] D. Adler,et al. Genetic algorithms and simulated annealing: a marriage proposal , 1993, IEEE International Conference on Neural Networks.

[16] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[17] N. Daw,et al. Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .

[18] V. Cerný. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[19] Marilyn A. Walker,et al. An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[20] Emile H. L. Aarts,et al. Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] Naohiro Fukumura,et al. Learning goal-directed sensory-based navigation of a mobile robot , 1994, Neural Networks.

[23] Steven M. Gustafson. An analysis of diversity in genetic programming , 2004 .

[24] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .