Analysis about Efficiency of Indirect Media Communication on Multi-agent Cooperation Learning

Reinforcement learning (RL) is an efficient learning method for Markov decision processes (MDPs); ant colony system (ACS) is an efficient method for solving combinatorial optimization problems. Based on the update policy of reinforcement values in RL and the cooperating method of the indirect media communication in ACS, this paper proposes the Q-ACS multi-agent cooperating learning method for the learning agents to share episodes beneficial to the exploitation of the accumulated knowledge and to utilize the learned reinforcement values efficiently. Further, taking the visited times into account, this paper proposes the T-ACS multi-agent learning method that the learning agents share better policies beneficial to the exploration during agent's learning processes. Meanwhile, in the light of the indirect media communication among heterogeneous multi-agents, this paper presents a heterogeneous multi-agent RL method, the D-ACS. The agents in our methods are given a simply cooperating way exchanging information in the form of reinforcement values updated in the common model of all agents. Owning the advantages of exploring the unknown environment actively and exploiting learned knowledge effectively, the proposed methods are able to solve both MDPs and combinatorial optimization problems effectively. To results of simulations on the hunter game and the traveling salesman problem, this paper discusses the role of the indirect media communication on the multi-agent cooperation learning system and analyzes its efficiency. The results of experiments also demonstrate that our methods perform competitively with representative methods on each domain respectively.

[1]  Ruoying Sun,et al.  An Accelerated k-Certainty Exploration Method , 1999 .

[2]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[3]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[4]  Thomas Stützle,et al.  MAX-MIN Ant System , 2000, Future Gener. Comput. Syst..

[5]  Marco Dorigo,et al.  The hyper-cube framework for ant colony optimization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Gang Zhao,et al.  Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environmnet , 2000 .

[7]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[8]  Alex Alves Freitas,et al.  Data mining with an ant colony optimization algorithm , 2002, IEEE Trans. Evol. Comput..

[9]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[10]  Shigenobu Kobayashi,et al.  k-Certainty Exploration Method: An Action Selector to Identify the Environment in Reinforcement Learning , 1997, Artif. Intell..

[11]  Thomas Stützle,et al.  A short convergence proof for a class of ant colony optimization algorithms , 2002, IEEE Trans. Evol. Comput..

[12]  Agostino Poggi,et al.  Multiagent Systems , 2006, Intelligenza Artificiale.

[13]  Michael Sampels,et al.  A MAX-MIN Ant System for the University Course Timetabling Problem , 2002, Ant Algorithms.

[14]  C.C. White,et al.  Dynamic programming and stochastic control , 1978, Proceedings of the IEEE.

[15]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[16]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[17]  Gang Zhao,et al.  Convergence of the Q-ae learning under deterministic MDPs and its efficiency under the stochastic environment , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[18]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[19]  Sati S. Sian,et al.  Extending Learning to Multiple Agents: Issues and a Model for Multi-Agent Machine Learning (MA-ML) , 1991, EWSL.

[20]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[21]  Vittorio Maniezzo,et al.  The Ant System Applied to the Quadratic Assignment Problem , 1999, IEEE Trans. Knowl. Data Eng..