Global search in combinatorial optimization using reinforcement learning algorithms

This paper presents two approaches that address the problems of the local character of the search and imprecise state representation of reinforcement learning (RL) algorithms for solving combinatorial optimization problems. The first, Bayesian, approach aims to capture solution parameter interdependencies. The second approach combines local information as encoded by typical RL schemes and global information as contained in a population of search agents. The effectiveness of these approaches is demonstrated on the quadratic assignment problem. Competitive results with the RL-agent approach suggest that it can be used as a basis for global optimization techniques.

[1]  Bernd Freisleben,et al.  A Genetic Local Search Approach to the Quadratic Assignment Problem , 1997, ICGA.

[2]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[3]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[4]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[5]  Andrew W. Moore,et al.  Learning evaluation functions for global optimization , 1998 .

[6]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[7]  W. Punch,et al.  An approach to solving combinatorial optimization problems using a population of reinforcement learning agents , 1999 .

[8]  Vittorio Maniezzo,et al.  The Ant System Applied to the Quadratic Assignment Problem , 1999, IEEE Trans. Knowl. Data Eng..

[9]  Andrew W. Moore,et al.  Learning Evaluation Functions for Global Optimization and Boolean Satisfiability , 1998, AAAI/IAAI.

[10]  Franz Rendl,et al.  QAPLIB – A Quadratic Assignment Problem Library , 1997, J. Glob. Optim..

[11]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[12]  R. Bellman A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[13]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[14]  Jean-Yves Potvin,et al.  Genetic Algorithms for the Traveling Salesman Problem , 2005 .

[15]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .