Coordinating Agents in Dynamic Environment

This paper presents strategies for speeding up the convergence of agents on swarm. Speeding up the learning of an agent is a complex task since the choice of inadequate updating techniques may cause delays in the learning process or even induce an unexpected acceleration that causes the agent to converge to a non-satisfactory policy. We have developed strategies for updating policies which combines local and global search using past policies. Experimental results in dynamic environments of different dimensions have shown that the proposed strategies are able to speed up the convergence of the agents while achieving optimal action policies, improving the coordination of agents in the swarm while deliberating.

[1]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[2]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[3]  T. Stützle,et al.  MAX-MIN Ant System and local search for the traveling salesman problem , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[4]  Dirk Sudholt,et al.  Theory of swarm intelligence , 2011, GECCO.

[5]  Michael Guntsch,et al.  Applying Population Based ACO to Dynamic Optimization Problems , 2002, Ant Algorithms.

[6]  Fabrício Enembreck,et al.  Distributed Constraint Optimization for scheduling in CSCWD , 2009, 2009 13th International Conference on Computer Supported Cooperative Work in Design.

[7]  Fabrício Enembreck,et al.  A sociologically inspired heuristic for optimization algorithms: A case study on ant systems , 2013, Expert Syst. Appl..

[8]  Ho-fung Leung,et al.  The Dynamics of Reinforcement Social Learning in Cooperative Multiagent Systems , 2013, IJCAI.

[9]  Frank Neumann,et al.  Theoretical analysis of two ACO approaches for the traveling salesman problem , 2011, Swarm Intelligence.

[10]  Matthew E. Taylor,et al.  Adaptive and Learning Agents, Second Workshop, ALA 2009, Held as Part of the AAMAS 2009 Conference in Budapest, Hungary, May 12, 2009, Revised Selected Papers , 2010, ALA.

[11]  Seyed Hessameddin Zegordi,et al.  A reinforcement learning model for supply chain ordering management: An application to the beer game , 2008, Decis. Support Syst..

[12]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[13]  Michael Wooldridge,et al.  Introduction to multiagent systems , 2001 .

[14]  Fabrício Enembreck,et al.  Unified Algorithm to Improve Reinforcement Learning in Dynamic Environments - An Instance-based Approach , 2012, ICEIS.

[15]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[16]  Shihua Gong,et al.  Dynamic ant colony optimisation for TSP , 2003 .

[17]  Eliseo Ferrante,et al.  Swarm robotics: a review from the swarm engineering perspective , 2013, Swarm Intelligence.

[18]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[19]  Luca Maria Gambardella,et al.  A Study of Some Properties of Ant-Q , 1996, PPSN.

[20]  Gerhard Reinelt,et al.  TSPLIB - A Traveling Salesman Problem Library , 1991, INFORMS J. Comput..

[21]  Nicos Christofides,et al.  EXPECTED DISTANCES IN DISTRIBUTION PROBLEMS , 1969 .

[22]  Fabrício Enembreck,et al.  Interaction Models for Multiagent Reinforcement Learning , 2008, 2008 International Conference on Computational Intelligence for Modelling Control & Automation.

[23]  TaeChoong Chung,et al.  Improved ant agents system by the dynamic parameter decision , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[24]  Ann Nowé,et al.  Decentralized Learning in Wireless Sensor Networks , 2009, ALA.

[25]  Martin Middendorf,et al.  Pheromone Modification Strategies for Ant Algorithms Applied to Dynamic TSP , 2001, EvoWorkshops.

[26]  K. M. Sim,et al.  Multiple ant-colony optimization for network routing , 2002, First International Symposium on Cyber Worlds, 2002. Proceedings..