Learning to Search via Self-Imitation

We study the problem of learning a good search policy. To do so, we propose the self-imitation learning setting, which builds upon imitation learning in two ways. First, self-imitation uses feedback provided by retrospective analysis of demonstrated search traces. Second, the policy can learn from its own decisions and mistakes without requiring repeated feedback from an external expert. Combined, these two properties allow our approach to iteratively scale up to larger problem sizes than the initial problem size for which expert demonstrations were provided. We showcase the effectiveness of our approach on a synthetic maze solving task and the problem of risk-aware path planning.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[3]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Daniel Lehmann,et al.  Optimal solutions for multi-unit combinatorial auctions: branch and bound heuristics , 2000, EC '00.

[6]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[7]  Ailsa H. Land,et al.  An Automatic Method of Solving Discrete Programming Problems , 1960 .

[8]  Tobias Achterberg,et al.  SCIP: solving constraint integer programs , 2009, Math. Program. Comput..

[9]  Le Song,et al.  Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[10]  András Prékopa The use of discrete moment bounds in probabilisticconstrained stochastic programming models , 1999, Ann. Oper. Res..

[11]  Louis Wehenkel,et al.  A Supervised Machine Learning Approach to Variable Branching in Branch-And-Bound , 2014 .

[12]  Ronan Le Bras,et al.  A Human Computation Framework for Boosting Combinatorial Solvers , 2014, HCOMP.

[13]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[14]  Yoav Shoham,et al.  Towards a universal test suite for combinatorial auction algorithms , 2000, EC '00.

[15]  Masahiro Ono,et al.  Paper Summary: Probabilistic Planning for Continuous Dynamic Systems under Bounded Risk , 2013, ICAPS.

[16]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[19]  Ambros M. Gleixner,et al.  SCIP: global optimization of mixed-integer nonlinear programs in a branch-and-cut framework , 2018, Optim. Methods Softw..

[20]  He He,et al.  Learning to Search in Branch and Bound Algorithms , 2014, NIPS.

[21]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[22]  Di Yuan,et al.  A Lagrangian Heuristic Based Branch-and-Bound Approach for the Capacitated Network Design Problem , 2000, Oper. Res..

[23]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[24]  B. Moor,et al.  Mixed integer programming for multi-vehicle path planning , 2001, 2001 European Control Conference (ECC).

[25]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..