Synthesizing Environment-Aware Activities via Activity Sketches

In order to learn to perform activities from demonstrations or descriptions, agents need to distill what the essence of the given activity is, and how it can be adapted to new environments. In this work, we address the problem: environment-aware program generation. Given a visual demonstration or a description of an activity, we generate program sketches representing the essential instructions and propose a model to flesh these into full programs representing the actions needed to perform the activity under the presented environmental constraints. To this end, we build upon VirtualHome, to create a new dataset VirtualHome-Env, where we collect program sketches to represent activities and match programs with environments that can afford them. Furthermore, we construct a knowledge base to sample realistic environments and another knowledge base to seek out the programs under the sampled environments. Finally, we propose RNN-ResActGraph, a network that generates a program from a given sketch and an environment graph and tracks the changes in the environment induced by the program.

[1]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[2]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[3]  Anthony G. Cohn,et al.  Natural Language Grounding and Grammar Induction for Robotic Manipulation Commands , 2017, RoboNLP@ACL.

[4]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[5]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[6]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[7]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[8]  Sanja Fidler,et al.  VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Swarat Chaudhuri,et al.  Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.

[12]  Basura Fernando,et al.  SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[13]  Dejan Pangercic,et al.  Robotic roommates making pancakes , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[14]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[15]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[16]  Ali Farhadi,et al.  Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Silvio Savarese,et al.  Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[19]  Hyeonwoo Noh,et al.  Neural Program Synthesis from Diverse Demonstration Videos , 2018, ICML.