Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.

[1]  Bernhard Nebel,et al.  Extending Planning Graphs to an ADL Subset , 1997, ECP.

[2]  Alan Fern,et al.  The first learning track of the international planning competition , 2011, Machine Learning.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Nir Lipovetzky,et al.  Structure and inference in classical planning , 2014 .

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[7]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[8]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[9]  R. Lathe Phd by thesis , 1988, Nature.

[10]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[11]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[12]  Stefanie Tellex,et al.  Goal-Based Action Priors , 2015, ICAPS.

[13]  Erik D. Demaine,et al.  PSPACE-completeness of sliding-block puzzles and other problems through the nondeterministic constraint logic model of computation , 2002, Theor. Comput. Sci..

[14]  Olivier Roussel,et al.  The International SAT Solver Competitions , 2012, AI Mag..

[15]  Tom Bylander,et al.  The Computational Complexity of Propositional STRIPS Planning , 1994, Artif. Intell..

[16]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[17]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[18]  Henry A. Kautz,et al.  BLACKBOX: A New Approach to the Application of Theorem Proving to Problem Solving , 1998 .

[19]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[20]  Monatshefte für Mathematik und Physik , 1892 .

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[23]  Jörg Hoffmann,et al.  FF: The Fast-Forward Planning System , 2001, AI Mag..

[24]  Lukás Chrpa,et al.  The 2014 International Planning Competition: Progress and Trends , 2015, AI Mag..

[25]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Pieter Abbeel,et al.  Learning Generalized Reactive Policies using Deep Neural Networks , 2017, ICAPS.

[28]  Hector J. Levesque,et al.  A New Method for Solving Hard Satisfiability Problems , 1992, AAAI.

[29]  Joao Marques-Silva,et al.  GRASP: A Search Algorithm for Propositional Satisfiability , 1999, IEEE Trans. Computers.

[30]  Jonathan Schaeffer,et al.  Sokoban: improving the search with relevance cuts , 2001, Theor. Comput. Sci..

[31]  Benjamin Rosman,et al.  What good are actions? Accelerating learning using learned action priors , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).