Scaling ant colony optimization with hierarchical reinforcement learning partitioning

This paper merges hierarchical reinforcement learning (HRL) with ant colony optimization (ACO) to produce a HRL ACO algorithm capable of generating solutions for large domains. This paper describes two specific implementations of the new algorithm: the first a modification to Dietterich's MAXQ-Q HRL algorithm, the second a hierarchical ant colony system algorithm. These implementations generate faster results, with little to no significant change in the quality of solutions for the tested problem domains. The application of ACO to the MAXQ-Q algorithm replaces the reinforcement learning, Q-learning, with the modified ant colony optimization method, Ant-Q. This algorithm, MAXQ-AntQ, converges to solutions not significantly different from MAXQ-Q in 88% of the time. This paper then transfers HRL techniques to the ACO domain and traveling salesman problem (TSP). To apply HRL to ACO, a hierarchy must be created for the TSP. A data clustering algorithm creates these subtasks, with an ACO algorithm to solve the individual and complete problems. This paper tests two clustering algorithms, k-means and G-means. The results demonstrate the algorithm with data clustering produces solutions 20 times faster with 5-10% decrease in solution quality due to the effects of clustering.

[1]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[2]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[3]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Mauro Birattari,et al.  Dm63 Heuristics for Combinatorial Optimization Ant Colony Optimization Exercises Outline Ant Colony Optimization: the Metaheuristic Application Examples Generalized Assignment Problem (gap) Connection between Aco and Other Metaheuristics Encodings Capacited Vehicle Routing Linear Ordering Ant Colony , 2022 .

[7]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[10]  Thomas Stützle,et al.  The MAX–MIN Ant System and Local Search for Combinatorial Optimization Problems: Towards Adaptive Tools for Global Optimization , 1997 .

[11]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[12]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[13]  Chris Walshaw,et al.  A Multilevel Approach to the Travelling Salesman Problem , 2002, Oper. Res..

[14]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[15]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[16]  Jianyong Liu,et al.  On Average Reward Semi-Markov Decision Processes with a General Multichain Structure , 2004, Math. Oper. Res..

[17]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[18]  William J. Cook,et al.  The Traveling Salesman Problem: A Computational Study , 2007 .

[19]  Marco Dorigo,et al.  Ant colony optimization , 2006, IEEE Computational Intelligence Magazine.

[20]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[21]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[22]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[23]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[24]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .