Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

We address one-shot imitation learning, where the goal is to execute a previously unseen task based on a single demonstration. While there has been exciting progress in this direction, most of the approaches still require a few hundred tasks for meta-training, which limits the scalability of the approaches. Our main contribution is to formulate one-shot imitation learning as a symbolic planning problem along with the symbol grounding problem. This formulation disentangles the policy execution from the inter-task generalization and leads to better data efficiency. The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures. We address this challenge by proposing a continuous relaxation of the discrete symbolic planner that directly plans on the probabilistic outputs of the symbol grounding model. Our continuous relaxation of the planner can still leverage the information contained in the probabilistic symbol grounding and significantly improve over the baseline planner for the one-shot imitation learning tasks without using large training data.

[1]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[2]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[3]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[4]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .

[5]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[6]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[7]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[8]  Rachid Alami,et al.  aSyMov: A Planner That Deals with Intricate Symbolic and Geometric Problems , 2003, ISRR.

[9]  Paolo Traverso,et al.  Automated planning - theory and practice , 2004 .

[10]  Scott Niekum,et al.  One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[11]  Patric Jensfelt,et al.  Learning spatial relations from functional simulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Leslie Pack Kaelbling,et al.  Hierarchical Planning in the Now , 2010, Bridging the Gap Between Task and Motion Planning.

[13]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[14]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[15]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Leslie Pack Kaelbling,et al.  Symbol Acquisition for Probabilistic High-Level Planning , 2015, IJCAI.

[18]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[19]  Marc Toussaint,et al.  Learning Grounded Relational Symbols from Continuous Data for Abstract Reasoning , 2013 .

[20]  Manfred Huber,et al.  A hybrid architecture for hierarchical reinforcement learning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[21]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[22]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[23]  Leslie Pack Kaelbling,et al.  Learning Quickly to Plan Quickly Using Modular Meta-Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Sebastian Scherer,et al.  Learning Heuristic Search via Imitation , 2017, CoRL.

[25]  C. Stachniss,et al.  From Low-Level Trajectory Demonstrations to Symbolic Actions for Planning , 2012 .

[26]  Silvio Savarese,et al.  Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[28]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[29]  Shimon Whiteson,et al.  TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[30]  Scott Niekum,et al.  Learning grounded finite-state representations from unstructured demonstrations , 2015, Int. J. Robotics Res..

[31]  Richard Dearden,et al.  Manipulation planning using learned symbolic state abstractions , 2014, Robotics Auton. Syst..

[32]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Alessandro Saffiotti,et al.  Geometric backtracking for combined task and motion planning in robotic systems , 2017, Artif. Intell..