Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap

We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space. Our framework consists of three parts: (1) a Mapping Module (MM) that maps observations, given in the form of images, into a structured latent space extracting the respective states, that generates observations from the latent states, (2) the LSR which builds and connects clusters containing similar states in order to find the latent plans between start and goal states extracted by MM, and (3) the Action Proposal Module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.

[1]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  Dinesh Manocha,et al.  Cloth Manipulation Using Random-Forest-Based Imitation Learning , 2018, IEEE Robotics and Automation Letters.

[4]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[5]  J. Scott Long,et al.  Modern Methods of Data Analysis , 1990 .

[6]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Allan Jabri,et al.  Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[9]  Stable Weight Decay Regularization , 2020, ArXiv.

[10]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[11]  Alessandro Saffiotti,et al.  Robot task planning using semantic maps , 2008, Robotics Auton. Syst..

[12]  Marco Pavone,et al.  Robot Motion Planning in Learned Latent Spaces , 2018, IEEE Robotics and Automation Letters.

[13]  Akansel Cosgun,et al.  Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience , 2020, CoRL.

[14]  Danica Kragic,et al.  Modeling, learning, perception, and control methods for deformable object manipulation , 2021, Science Robotics.

[15]  Dario Bruzzese,et al.  DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach , 2015, Journal of Classification.

[16]  Leslie Pack Kaelbling,et al.  A constraint-based method for solving sequential manipulation planning problems , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[19]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.

[20]  Yacov Hel-Or,et al.  Faithful Autoencoder Interpolation by Shaping the Latent Space , 2020, ArXiv.

[21]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Michael C. Yip,et al.  Motion Planning Networks: Bridging the Gap Between Learning-Based and Classical Motion Planners , 2019, IEEE Transactions on Robotics.

[24]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[25]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[26]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[27]  Danica Kragic,et al.  Benchmarking Bimanual Cloth Manipulation , 2020, IEEE Robotics and Automation Letters.

[28]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[29]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[30]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[31]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Pieter Abbeel,et al.  Learning Predictive Representations for Deformable Objects Using Contrastive Estimation , 2020, CoRL.

[33]  David Berthelot,et al.  Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer , 2018, ICLR.

[34]  Pieter Abbeel,et al.  Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[35]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[36]  Katsu Yamane,et al.  VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation , 2020, RSS 2020.

[37]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[38]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[39]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[40]  Belhassen-Chedli Bouzgarrou,et al.  Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey , 2018, Int. J. Robotics Res..

[41]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2020, ICLR.

[42]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[43]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[44]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[45]  Richard P. Brent,et al.  An Algorithm with Guaranteed Convergence for Finding a Zero of a Function , 1971, Comput. J..

[46]  Chelsea Finn,et al.  Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors , 2020, NeurIPS.

[47]  David Held,et al.  SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation , 2020, CoRL.

[48]  Danica Kragic,et al.  Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49]  Jan Peters,et al.  Active Incremental Learning of Robot Movement Primitives , 2017, CoRL.

[50]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[51]  John Canny,et al.  Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor , 2019 .

[52]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Jianing Qian,et al.  Cloth Region Segmentation for Robust Grasp Selection , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[56]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[57]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[58]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[59]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[60]  Elena Baralis,et al.  Adaptive Hierarchical Clustering for Petrographic Image Analysis , 2019, EDBT/ICDT Workshops.

[61]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[62]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[63]  M. Köppen,et al.  The Curse of Dimensionality , 2010 .

[64]  R. Berk A Primer on Robust Regression , 1990 .

[65]  A. K. Jain,et al.  Data Clustering : A , 2007 .

[66]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[67]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[68]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[69]  Pieter Abbeel,et al.  Learning Robotic Manipulation through Visual Planning and Acting , 2019, Robotics: Science and Systems.

[70]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.