Tuning Local Search by Average-Reward Reinforcement Learning

Reinforcement Learning and local search have been combined in a variety of ways, in order to learn how to solve combinatorial problems more efficiently. Most approaches optimise the total reward, where the reward at each action is the change in objective function. We argue that it is more appropriate to optimise the average reward. We use R-learning to dynamically tune noise in standard SAT local search algorithms on single instances. Experiments show that noise can be successfully automated in this way.

[1]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.

[2]  Andrew W. Moore,et al.  Learning Evaluation Functions for Global Optimization and Boolean Satisfiability , 1998, AAAI/IAAI.

[3]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[4]  Thomas Stützle,et al.  Stochastic Local Search: Foundations & Applications , 2004 .

[5]  Klaus E. Varrentrapp,et al.  A practical framework for adaptive metaheuristics , 2008 .

[6]  John Hallam,et al.  Hybrid problems, hybrid solutions , 1995 .

[7]  Alexander Nareyek,et al.  Choosing search heuristics by non-stationary reinforcement learning , 2004 .

[8]  William F. Punch,et al.  Global search in combinatorial optimization using reinforcement learning algorithms , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[9]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[10]  Michail G. Lagoudakis,et al.  Algorithm Selection using Reinforcement Learning , 2000, ICML.

[11]  Richard S. Sutton,et al.  Learning Instance-Independent Value Functions to Enhance Local Search , 1998, NIPS.

[12]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[13]  Toby Walsh,et al.  An Empirical Analysis of Search in GSAT , 1993, J. Artif. Intell. Res..

[14]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[15]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[16]  Holger H. Hoos,et al.  Scaling and Probabilistic Smoothing: Dynamic Local Search for Unweighted MAX-SAT , 2003, Canadian Conference on AI.

[17]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[18]  Steven David Prestwich,et al.  Random Walk with Continuously Smoothed Variable Weights , 2005, SAT.

[19]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[20]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[23]  Bart Selman,et al.  Evidence for Invariants in Local Search , 1997, AAAI/IAAI.

[24]  Paul Morris,et al.  The Breakout Method for Escaping from Local Minima , 1993, AAAI.

[25]  Ian P. Gent,et al.  Unsatisfied Variables in Local Search , 1995 .