Visual Robot Task Planning

Prospection is key to solving challenging problems in new environments, but it has not been deeply explored as applied to task planning for perception-driven robotics. We propose visual robot task planning, where we take in an input image and must generate a sequence of high-level actions and associated observations that achieve some task. In this paper, we describe a neural network architecture and associated planning algorithm that (1) learns a representation of the world that can generate prospective futures, (2) uses this generative model to simulate the result of sequences of high-level actions in a variety of environments, and (3) evaluates these actions via a variant of Monte Carlo Tree Search to find a viable solution to a particular problem. Our approach allows us to visualize intermediate motion goals and learn to plan complex activity from visual information, and used this to generate and visualize task plans on held-out examples of a block-stacking simulation.

[1]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[2]  Nico Blodow,et al.  Cognition-Enabled Autonomous Robot Control for the Realization of Home Chore Task Intelligence , 2012, Proc. IEEE.

[3]  M. Seligman,et al.  Navigating Into the Future or Driven by the Past , 2013, Perspectives on psychological science : a journal of the Association for Psychological Science.

[4]  Marc Toussaint,et al.  Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning , 2015, IJCAI.

[5]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[6]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[7]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[8]  Narges Ahmidi,et al.  Analysis of the Structure of Surgical Activity for a Suturing and Knot-Tying Task , 2016, PloS one.

[9]  David Hsu,et al.  Act to See and See to Act: POMDP planning for objects search in clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[11]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[12]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Aaron C. Courville,et al.  Discriminative Regularization for Generative Models , 2016, ArXiv.

[14]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[15]  Ashutosh Saxena,et al.  Robobarista: Learning to Manipulate Novel Objects via Deep Multimodal Embedding , 2016, ArXiv.

[16]  Maximilian Baust,et al.  Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Gregory D. Hager,et al.  Combining neural networks and tree search for task and motion planning in challenging environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[20]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control , 2017, ArXiv.

[21]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[22]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[26]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[27]  David Hsu,et al.  QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.

[28]  Gregory D. Hager,et al.  CoSTAR: Instructing collaborative robots with behavior trees and vision , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Razvan Pascanu,et al.  Learning model-based planning from scratch , 2017, ArXiv.

[30]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  D. Fox,et al.  SE 3-Pose-Nets : Structured Deep Dynamics Models for Visuomotor Planning and Control , 2018 .

[32]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).