On th er elatio nb etween Ant Colony Optimization and Heuristically Accelerated Reinforcement Learning

This paper has two mai ng oals :t he firs ti s t op ropos ea new class of Heuristically AcceleratedReinforcement Learning algorithms (HARL), the Distributed HARLs, describing on ea lgorithm of this class, the Heuristically Accelerated Distributed QLearning (HADQL); and the second is to show that Ant Colony Optimization(ACO) algorithms ca nb e seenasinstancesofDistributedHARLsalgorithms. I np articular, this paper shows tha tt he Ant Colony System (ACS) algorithm ca nb e interpreted as a particular case of the HADQL algorithm. This interpretation is very attractive, as many of th ec onclusions obtained for RL algorithms remai nv alid for Distributed HARL algorithms,such as the guarantee of convergence to equilibrium. I no rder t ob etter evaluate the proposal, w ec ompared the performances of the Distributed Q-Learning, the HADQL and the ACS algorithms in the Traveling Salesman Problem domain. The result ss how that HADQL and the ACS algorithm have similar performances, as it woul db ee xpected from the hypothesis tha tt hey are, in fact, instances of the same class of algorithms.

[1]  G. Theraulaz,et al.  Inspiration for optimization from social insect behaviour , 2000, Nature.

[2]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[3]  Reinaldo A. C. Bianchi,et al.  Heuristic Selection of Actions in Multiagent Reinforcement Learning , 2007, IJCAI.

[4]  Reinaldo A. C. Bianchi,et al.  Accelerating autonomous learning by using heuristic selection of actions , 2008, J. Heuristics.

[5]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Eduardo F. Morales,et al.  DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning , 2001, ECML.

[8]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[9]  R. Simmons,et al.  The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms , 2004, Machine Learning.

[10]  Marco Dorigo,et al.  Ant colony optimization theory: A survey , 2005, Theor. Comput. Sci..

[11]  Thomas Stützle,et al.  A short convergence proof for a class of ant colony optimization algorithms , 2002, IEEE Trans. Evol. Comput..

[12]  Anthony B. Maddox,et al.  A Framework for Distributed Reinforcement Learning , 1995, Adaption and Learning in Multi-Agent Systems.

[13]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[14]  Mark D. Pendrith Distributed reinforcement learning for a traffic engineering application , 2000, AGENTS '00.

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[17]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .