RLBS: An Adaptive Backtracking Strategy Based on Reinforcement Learning for Combinatorial Optimization

Combinatorial optimization problems are often very difficult to solve and the choice of a search strategy has a tremendous influence over the solver's performance. A search strategy is said to be adaptive when it dynamically adapts to the structure of the problem instance and identifies the areas of the search space that contain good solutions. We introduce an algorithm (RLBS) that learns to efficiently backtrack when searching non-binary trees. Branching can be carried on using any usual variable/value selection strategy. However, when backtracking is needed, the selection of the node to target involves reinforcement learning. As the trees are non-binary, we have the opportunity to backtrack many times to each node during the search, which allows learning which nodes generally lead to the best rewards (that is, to the most interesting leaves). RLBS is evaluated for a scheduling problem using real industrial data. It outperforms classic (nonadaptive) backtracking strategies (DFS, LDS) as well as an adaptive branching strategy (IBS).

[1]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[2]  Matthew L. Ginsberg,et al.  Limited Discrepancy Search , 1995, IJCAI.

[3]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[4]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[5]  Richard S. Sutton,et al.  Learning Instance-Independent Value Functions to Enhance Local Search , 1998, NIPS.

[6]  William F. Punch,et al.  Global search in combinatorial optimization using reinforcement learning algorithms , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[7]  Pascal Van Hentenryck The OPL optimization programming language , 1999 .

[8]  J. Christopher Beck,et al.  Discrepancy-Bounded Depth First Search , 2000 .

[9]  S. Shieber,et al.  Adaptive tree search , 2002 .

[10]  W. Ruml,et al.  Heuristic Search in Bounded-depth Trees: Best-Leaf-First Search , 2002 .

[11]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  Lakhdar Sais,et al.  Boosting Systematic Search by Weighting Constraints , 2004, ECAI.

[14]  Philippe Refalo,et al.  Impact-Based Search Strategies for Constraint Programming , 2004, CP.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  R. Wallace,et al.  Learning from Failure in Constraint Satisfaction Search , 2006 .

[17]  Andrew W. Moore,et al.  Using Prediction to Improve Combinatorial Optimization Search , 2007 .

[18]  Wady Naanaa,et al.  YIELDS: A Yet Improved Limited Discrepancy Search for CSPs , 2007, CPAIOR.

[19]  David H. Stern,et al.  Learning Adaptation to Solve Constraint Satisfaction Problems , 2009 .

[20]  Jonathan Gaudreaulta,et al.  DISTRIBUTED OPERATIONS PLANNING IN THE SOFTWOOD LUMBER SUPPLY CHAIN: MODELS AND COORDINATION , 2010 .

[21]  Gilles Pesant,et al.  Supply Chain Coordination Using an Adaptive Distributed Search Strategy , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Claude-Guy Quimper,et al.  Parallel Discrepancy-Based Search , 2013, CP.

[23]  Michèle Sebag,et al.  Bandit-Based Search for Constraint Programming , 2013, CP.

[24]  Michèle Sebag,et al.  Hybridizing Constraint Programming and Monte-Carlo Tree Search: Application to the Job Shop Problem , 2013, LION.