Learning Quickly to Plan Quickly Using Modular Meta-Learning

Multi-object manipulation problems in continuous state and action spaces can be solved by planners that search over sampled values for the continuous parameters of operators. The efficiency of these planners depends critically on the effectiveness of the samplers used, but effective sampling in turn depends on details of the robot, environment, and task. Our strategy is to learn functions called speciatizers that generate values for continuous operator parameters, given a state description and values for the discrete parameters. Rather than trying to learn a single specializer for each operator from large amounts of data on a single task, we take a modular meta-learning approach. We train on multiple tasks and learn a variety of specializers that, on a new task, can be quickly adapted using relatively little data – thus, our system learns quickly to plan quickly using these specializers. We validate our approach experimentally in simulated 3D pick-and-place tasks with continuous state and action spaces. Visit http://tinyurl.com/chitnis-icra-19 for a supplementary video.

[1]  Leslie Pack Kaelbling,et al.  Sampling-based methods for factored task and motion planning , 2018, Int. J. Robotics Res..

[2]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[3]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[6]  Leslie Pack Kaelbling,et al.  Learning to guide task and motion planning using score-space representation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Oliver Kroemer,et al.  Towards Robot Skill Learning: From Simple Skills to Table Tennis , 2013, ECML/PKDD.

[8]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Leslie Pack Kaelbling,et al.  Active Model Learning and Diverse Action Sampling for Task and Motion Planning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[11]  Pieter Abbeel,et al.  Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[12]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[13]  Marc Toussaint,et al.  Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning , 2015, IJCAI.

[14]  Leslie Pack Kaelbling,et al.  Learning to guide task and motion planning using score-space representation , 2019, Int. J. Robotics Res..

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Leslie Pack Kaelbling,et al.  Sampling-based methods for factored task and motion planning , 2017, Robotics: Science and Systems.

[17]  Jan Peters,et al.  Learning modular policies for robotics , 2014, Front. Comput. Neurosci..

[18]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[19]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[20]  Leslie Pack Kaelbling,et al.  Modular meta-learning , 2018, CoRL.

[21]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[22]  Aude Billard,et al.  A survey of Tactile Human-Robot Interactions , 2010, Robotics Auton. Syst..

[23]  Dylan Hadfield-Menell,et al.  Guided search for task and motion plans using learned heuristics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25]  Leslie Pack Kaelbling,et al.  Guiding Search in Continuous State-Action Spaces by Learning an Action Sampler From Off-Target Search Experience , 2018, AAAI.

[26]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[27]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[28]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[29]  Shimon Whiteson,et al.  TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[30]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).