论文信息 - Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

Modeling the world can benefit robot learning by providing a rich 1 training signal for shaping an agent’s latent state space. However, learning world 2 models in unconstrained environments over high-dimensional observation spaces 3 such as images is challenging. One source of difficulty is the presence of irrelevant 4 but hard-to-model background distractions, and unimportant visual details of task5 relevant entities. We address this issue by learning a recurrent latent dynamics 6 model which contrastively predicts the next observation. This simple model leads 7 to surprisingly robust robotic control even with simultaneous camera, background, 8 and color distractions. We outperform alternatives such as bisimulation methods 9 which impose state-similarity measures derived from divergence in future reward or 10 future optimal actions. We obtain state-of-the-art results on the Distracting Control 11 Suite, a challenging benchmark for pixel-based robotic control. 12

[1] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[2] Aapo Hyvärinen,et al. Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA , 2016, NIPS.

[3] Pieter Abbeel,et al. Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[4] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Ang Li,et al. Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control , 2020, ICLR.

[7] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[8] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[9] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[10] Xiao Ma,et al. Contrastive Variational Reinforcement Learning for Complex Observations , 2020, CoRL.

[11] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[13] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[16] Stefano Ermon,et al. Temporal Predictive Coding For Model-Based Planning In Latent Space , 2021, ICML.

[17] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .

[18] Rowan McAllister,et al. Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[19] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[20] W. Hager,et al. and s , 2019, Shallow Water Hydraulics.

[21] Unlocking Pixels for Reinforcement Learning via Implicit Attention , 2021, ArXiv.

[22] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[23] Marlos C. Machado,et al. Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning , 2021, ICLR.