A Cooperation Online Reinforcement Learning Approach in Ant-Q

Online reinforcement learning achieves learning after update estimation value for (state, action) pairs selecting in present state before do state transition by next state. Therefore, online reinforcement learning needs polynomial search time to find most optimal value-function. But, a lots of reinforcement learning that are proposed for online reinforcement learning update estimation value for (state, action) pairs that agents select in present state, and because estimation value for unselected (state, action) pairs is evaluated in other episodes, perfect online reinforcement learning is not. Therefore, in this paper, we propose online ant reinforcement learning method using Ant-Q and eligibility trace to solve this problem. The eligibility trace is one of the basic mechanisms in reinforcement learning to handle delayed reward. The traces are said to indicate the degree to which each state is eligible for undergoing learning changes should a reinforcing event occur. Formally, there are two kinds of eligibility traces(accumulating trace or replacing traces). In this paper, we propose online ant reinforcement learning algorithms using an eligibility traces which is called replace-trace methods. This method is a hybrid of Ant-Q and eligibility traces. Although replacing traces are only slightly different from accumulating traces, it can produce a significant improvement in optimization. We could know through an experiment that proposed reinforcement learning method converges faster to optimal solution than Ant Colony System and Ant-Q.

[1]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[2]  Marco Dorigo,et al.  An Investigation of some Properties of an "Ant Algorithm" , 1992, PPSN.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Luca Maria Gambardella,et al.  Solving symmetric and asymmetric TSPs by ant colonies , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[6]  T. Stützle,et al.  MAX-MIN Ant System and local search for the traveling salesman problem , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[7]  SeungGwan Lee Multiagent Reinforcement Learning Algorithm Using Temporal Difference Error , 2005, ISNN.

[8]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[9]  Etienne Barnard,et al.  Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..

[10]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[11]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[12]  Luca Maria Gambardella,et al.  A Study of Some Properties of Ant-Q , 1996, PPSN.

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[15]  TaeChoong Chung,et al.  A Reinforcement Learning Algorithm Using Temporal Difference Error in Ant Model , 2005, IWANN.