Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation

We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces such as manipulation of deformable objects. Planning is performed in a low-dimensional latent state space that embeds images. We define and implement a Latent Space Roadmap (LSR) which is a graph-based structure that globally captures the latent system dynamics. Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them. We show the effectiveness of the method on a simulated box stacking task as well as a T-shirt folding task performed with a real robot.

[1]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.

[2]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[3]  Bodo Rosenhahn,et al.  Structuring Autoencoders , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[4]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[5]  Stanley J. Rosenschein,et al.  Formal theories of knowledge in AI and robotics , 1986, New Generation Computing.

[6]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[7]  Katsu Yamane,et al.  Deep Imitation Learning of Sequential Fabric Smoothing Policies , 2019, ArXiv.

[8]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Dinesh Manocha,et al.  I-cloth , 2018, ACM Trans. Graph..

[11]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[13]  Marco Pavone,et al.  Robot Motion Planning in Learned Latent Spaces , 2018, IEEE Robotics and Automation Letters.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[16]  Pieter Abbeel,et al.  Learning Robotic Manipulation through Visual Planning and Acting , 2019, Robotics: Science and Systems.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Wolfram Burgard,et al.  Probabilistic Methods for State Estimation in Robotics , 1999 .

[19]  Leslie Pack Kaelbling,et al.  A constraint-based method for solving sequential manipulation planning problems , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[21]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[22]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[23]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[24]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[25]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[26]  John Canny,et al.  Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor , 2019 .