On the Efficient Implementation Biologic Reinforcement Learning Using Eligibility Traces

The eligibility trace is one of the basic mechanisms in reinforcement learning to handle delayed reward. In this paper, we have used meta-heuristic method to solve hard combinatorial optimization problems. Our proposed solution introduce Ant-Q learning method to solve Traveling Salesman Problem (TSP). The approach is based on population that use positive feedback as well as greedy search and suggest ant reinforcement learning algorithms using eligibility traces which is called replace-trace methods(Ant-TD(λ)). Although replacing traces are only slightly, they can produce a significant improvement in learning rate. We could know through an experiment that proposed reinforcement learning method converges faster to optimal solution than ACS and Ant-Q.

[1]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[2]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[3]  Luca Maria Gambardella,et al.  Solving symmetric and asymmetric TSPs by ant colonies , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[4]  SeungGwan Lee Multiagent Reinforcement Learning Algorithm Using Temporal Difference Error , 2005, ISNN.

[5]  T. Stützle,et al.  MAX-MIN Ant System and local search for the traveling salesman problem , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[6]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[11]  Marco Dorigo,et al.  An Investigation of some Properties of an "Ant Algorithm" , 1992, PPSN.

[12]  Etienne Barnard,et al.  Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..

[13]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[14]  Luca Maria Gambardella,et al.  A Study of Some Properties of Ant-Q , 1996, PPSN.