Learning value functions with relational state representations for guiding task-and-motion planning

We propose a novel relational state representation and an action-value function learning algorithm that learns from planning experience for geometric task-and-motion planning (GTAMP) problems, in which the goal is to move several objects to regions in the presence of movable obstacles. The representation encodes information about which objects occlude the manipulation of other objects and is encoded using a small set of predicates. It supports efficient learning, using graph neural networks, of an action-value function that can be used to guide a GTAMP solver. Importantly, it enables learning from planning experience on simple problems and generalizing to more complex problems and even across substantially different geometric environments. We demonstrate the method in two challenging GTAMP domains.

[1]  FernAlan,et al.  Learning partial policies to speedup MDP tree search via reduction to I.I.D. learning , 2017 .

[2]  Alessandro Sperduti,et al.  Supervised neural networks for the classification of structures , 1997, IEEE Trans. Neural Networks.

[3]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[4]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[5]  Dylan Hadfield-Menell,et al.  Guided search for task and motion plans using learned heuristics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Leslie Pack Kaelbling,et al.  Learning to Rank for Synthesizing Planning Heuristics , 2016, IJCAI.

[7]  Kris K. Hauser,et al.  The minimum constraint removal problem with three robotics applications , 2014, Int. J. Robotics Res..

[8]  Erez Karpas,et al.  To Max or Not to Max: Online Learning for Speeding Up Optimal Planning , 2010, AAAI.

[9]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Michael Fink,et al.  Online Learning of Search Heuristics , 2007, AISTATS.

[11]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[12]  Robert Givan,et al.  Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[13]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[14]  Leslie Pack Kaelbling,et al.  Adversarial Actor-Critic Method for Task and Motion Planning Problems Using Planning Experience , 2019, AAAI.

[15]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[16]  Kostas E. Bekris,et al.  Dealing with Difficult Instances of Object Rearrangement , 2015, Robotics: Science and Systems.

[17]  Leslie Pack Kaelbling,et al.  Learning Quickly to Plan Quickly Using Modular Meta-Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[18]  Tamim Asfour,et al.  Manipulation Planning Among Movable Obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[19]  Sebastian Scherer,et al.  Learning Heuristic Search via Imitation , 2017, CoRL.

[20]  Amir Massoud Farahmand,et al.  Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.

[21]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[22]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[23]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.

[24]  Leslie Pack Kaelbling,et al.  Guiding Search in Continuous State-Action Spaces by Learning an Action Sampler From Off-Target Search Experience , 2018, AAAI.

[25]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.