Comparing Reconstruction- and Contrastive-based Models for Visual Task Planning

Learning state representations enables robotic planning directly from raw observations such as images. Most methods learn state representations by utilizing losses based on the reconstruction of the raw observations from a lowerdimensional latent space. The similarity between observations in the space of images is often assumed and used as a proxy for estimating similarity between the underlying states of the system. However, observations commonly contain task-irrelevant factors of variation which are nonetheless important for reconstruction, such as varying lighting and different camera viewpoints. In this work, we define relevant evaluation metrics and perform a thorough study of different loss functions for state representation learning. We show that models exploiting task priors, such as Siamese networks with a simple contrastive loss, outperform reconstruction-based representations in visual task planning.

[1]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[2]  Pieter Abbeel,et al.  Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[3]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[4]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Leslie Pack Kaelbling,et al.  From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Rowan McAllister,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[8]  S. Savarese,et al.  Goal-Aware Prediction: Learning to Model What Matters , 2020, ICML.

[9]  Alan F. Smeaton,et al.  Contrastive Representation Learning: A Framework and Review , 2020, IEEE Access.

[10]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[11]  M. Lippi,et al.  State Representations in Robotics: Identifying Relevant Factors of Variation using Weak Supervision , 2020 .

[12]  Jan Peters,et al.  Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Davide Chicco,et al.  Siamese Neural Networks: An Overview , 2021, Artificial Neural Networks, 3rd Edition.

[14]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2020, ICLR.

[15]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[20]  Philippe Beaudoin,et al.  Disentangling the independently controllable factors of variation by interacting with the world , 2018, ArXiv.

[21]  Marco Pavone,et al.  Robot Motion Planning in Learned Latent Spaces , 2018, IEEE Robotics and Automation Letters.

[22]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[23]  Jonathan Tompson,et al.  Unsupervised Feature Learning from Temporal Data , 2015, ICLR.

[24]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[25]  S. Höfer,et al.  Using Slow Feature Analysis to Extract Behavioural Manifolds Related to Humanoid Robot Postures , 2010 .

[26]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[27]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[28]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[29]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[31]  Chelsea Finn,et al.  Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors , 2020, NeurIPS.

[32]  Danica Kragic,et al.  Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Robert A. Legenstein,et al.  Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[34]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[35]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Katsu Yamane,et al.  VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation , 2020, RSS 2020.

[37]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Marc Toussaint,et al.  Learning Grounded Relational Symbols from Continuous Data for Abstract Reasoning , 2013 .

[39]  John K Haas,et al.  A History of the Unity Game Engine , 2014 .

[40]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[41]  Swarat Chaudhuri,et al.  Incremental Task and Motion Planning: A Constraint-Based Approach , 2016, Robotics: Science and Systems.

[42]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .