Learning to Search via Retrospective Imitation

We study the problem of learning a good search policy from demonstrations for combinatorial search spaces. We propose retrospective imitation learning, which, after initial training by an expert, improves itself by learning from its own retrospective solutions. That is, when the policy eventually reaches a feasible solution in a search tree after making mistakes and backtracks, it retrospectively constructs an improved search trace to the solution by removing backtracks, which is then used to further train the policy. A key feature of our approach is that it can iteratively scale up, or transfer, to larger problem sizes than the initial expert demonstrations, thus dramatically expanding its applicability beyond that of conventional imitation learning. We showcase the effectiveness of our approach on two tasks: synthetic maze solving, and integer program based risk-aware path planning.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Yoav Shoham,et al.  Towards a universal test suite for combinatorial auction algorithms , 2000, EC '00.

[3]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[4]  Ambros M. Gleixner,et al.  SCIP: global optimization of mixed-integer nonlinear programs in a branch-and-cut framework , 2018, Optim. Methods Softw..

[5]  He He,et al.  Learning to Search in Branch and Bound Algorithms , 2014, NIPS.

[6]  Masahiro Ono,et al.  Paper Summary: Probabilistic Planning for Continuous Dynamic Systems under Bounded Risk , 2013, ICAPS.

[7]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[8]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[9]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[10]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[11]  Byron Boots,et al.  Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[12]  B. Bollobás The evolution of random graphs , 1984 .

[13]  Yisong Yue,et al.  Learning Policies for Contextual Submodular Prediction , 2013, ICML.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Di Yuan,et al.  A Lagrangian Heuristic Based Branch-and-Bound Approach for the Capacitated Network Design Problem , 2000, Oper. Res..

[16]  Tobias Achterberg,et al.  SCIP: solving constraint integer programs , 2009, Math. Program. Comput..

[17]  Le Song,et al.  Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[18]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[19]  B. Moor,et al.  Mixed integer programming for multi-vehicle path planning , 2001, 2001 European Control Conference (ECC).

[20]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[21]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[22]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[23]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[24]  A. Land,et al.  An Automatic Method for Solving Discrete Programming Problems , 1960, 50 Years of Integer Programming.

[25]  Daniel Lehmann,et al.  Optimal solutions for multi-unit combinatorial auctions: branch and bound heuristics , 2000, EC '00.

[26]  Mislav Balunovic,et al.  Learning to Solve SMT Formulas , 2018, NeurIPS.

[27]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Kevin Leyton-Brown,et al.  Automated Configuration of Mixed Integer Programming Solvers , 2010, CPAIOR.

[30]  Masahiro Ono,et al.  An Efficient Motion Planning Algorithm for Stochastic Dynamic Systems with Constraints on Probability of Failure , 2008, AAAI.

[31]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[32]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[33]  Andrew W. Moore,et al.  Learning Evaluation Functions for Global Optimization and Boolean Satisfiability , 1998, AAAI/IAAI.

[34]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[35]  András Prékopa The use of discrete moment bounds in probabilisticconstrained stochastic programming models , 1999, Ann. Oper. Res..

[36]  Louis Wehenkel,et al.  A Supervised Machine Learning Approach to Variable Branching in Branch-And-Bound , 2014 .

[37]  Ronan Le Bras,et al.  A Human Computation Framework for Boosting Combinatorial Solvers , 2014, HCOMP.

[38]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.