Learning Generalized Reactive Policies using Deep Neural Networks

We present a new approach to learning for planning, where knowledge acquired while solving a given set of planning problems is used to plan faster in related, but new problem instances. We show that a deep neural network can be used to learn and represent a \emph{generalized reactive policy} (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances. In contrast to prior efforts in this direction, our approach significantly reduces the dependence of learning on handcrafted domain knowledge or feature selection. Instead, the GRP is trained from scratch using a set of successful execution traces. We show that our approach can also be used to automatically learn a heuristic function that can be used in directed search algorithms. We evaluate our approach using an extensive suite of experiments on two challenging planning problem domains and show that our approach facilitates learning complex decision making policies and powerful heuristic functions with minimal human input. Videos of our results are available at this http URL.

[1]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[2]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[3]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[4]  Jude W. Shavlik,et al.  Acquiring Recursive Concepts with Explanation-Based Learning , 1989, IJCAI.

[5]  Tom Bylander,et al.  The Computational Complexity of Propositional STRIPS Planning , 1994, Artif. Intell..

[6]  Subbarao Kambhampati,et al.  A Unified Framework for Explanation-Based Generalization of Partially Ordered and Partially Instantiated Plans , 1994, Artif. Intell..

[7]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[8]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[9]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[10]  Jörg Hoffmann,et al.  FF: The Fast-Forward Planning System , 2001, AI Mag..

[11]  Jonathan Schaeffer,et al.  Sokoban: Enhancing general single-agent search methods using domain knowledge , 2001, Artif. Intell..

[12]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[13]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[14]  Hector Geffner,et al.  Learning Generalized Policies from Planning Examples Using Concept Languages , 2004, Applied Intelligence.

[15]  Marco Gori,et al.  Likely-Admissible and Sub-Symbolic Heuristics , 2004, ECAI.

[16]  Luca Spalazzi,et al.  A Survey on Case-Based Planning , 2004, Artificial Intelligence Review.

[17]  G. Konidaris A Framework for Transfer in Reinforcement Learning , 2006 .

[18]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[19]  Jude W. Shavlik,et al.  Skill Acquisition Via Transfer Learning and Advice Taking , 2006, ECML.

[20]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[21]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[22]  Malte Helmert,et al.  Concise finite-domain representations for PDDL planning tasks , 2009, Artif. Intell..

[23]  Silvia Richter,et al.  The LAMA Planner: Guiding Cost-Based Anytime Planning with Landmarks , 2010, J. Artif. Intell. Res..

[24]  Neil Immerman,et al.  Directed Search for Generalized Plans Using Classical Planners , 2011, ICAPS.

[25]  Alan Fern,et al.  The first learning track of the international planning competition , 2011, Machine Learning.

[26]  Joshua Taylor,et al.  Procedural Generation of Sokoban Levels , 2011 .

[27]  Neil Immerman,et al.  A new representation and associated algorithms for generalized planning , 2011, Artif. Intell..

[28]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[29]  Yuxiao Hu,et al.  Generalized Planning: Synthesizing Plans that Work for Multiple Environments , 2011, IJCAI.

[30]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[31]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Benjamin Rosman,et al.  What good are actions? Accelerating learning using learned action priors , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[34]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35]  Pietro Torasso,et al.  Deordering and Numeric Macro Actions for Plan Repair , 2015, IJCAI.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Stefanie Tellex,et al.  Goal-Based Action Priors , 2015, ICAPS.

[38]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[39]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[40]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[41]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[43]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[44]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[45]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[46]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[48]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[50]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .