论文信息 - Contrastive Learning as Goal-Conditioned Reinforcement Learning

Contrastive Learning as Goal-Conditioned Reinforcement Learning

In reinforcement learning (RL), it is easier to solve a task if given a good representation. While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable and instead equip RL algorithms with additional representation learning parts (e.g., auxiliary losses, data augmentation). How can we design RL algorithms that directly acquire good representations? In this paper, instead of adding representation learning parts to an existing RL algorithm, we show (contrastive) representation learning methods can be cast as RL algorithms in their own right. To do this, we build upon prior work and apply contrastive representation learning to action-labeled trajectories, in such a way that the (inner product of) learned representations exactly corresponds to a goal-conditioned value function. We use this idea to reinterpret a prior RL method as performing contrastive learning, and then use the idea to propose a much simpler method that achieves similar performance. Across a range of goal-conditioned RL tasks, we demonstrate that contrastive RL methods achieve higher success rates than prior non-contrastive methods, including in the offline RL setting. We also show that contrastive RL outperforms prior methods on image-based tasks, without using data augmentation or auxiliary objectives.

S. Levine | R. Salakhutdinov | Benjamin Eysenbach | Tianjun Zhang

[1] Zhuoran Yang,et al. Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning , 2022, ICML.

[2] D. Schuurmans,et al. Making Linear MDPs Practical via Contrastive Representation Learning , 2022, ICML.

[3] Pulkit Agrawal,et al. Overcoming the Spectral Bias of Neural Value Approximation , 2022, ICLR.

[4] S. Levine,et al. Imitating Past Successes can be Very Suboptimal , 2022, NeurIPS.

[5] Marlos C. Machado,et al. Investigating the Properties of Neural Network Representations in Reinforcement Learning , 2022, ArXiv.

[6] P. Abbeel,et al. CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery , 2022, ArXiv.

[7] S. Levine,et al. RvS: What is Essential for Offline RL via Supervised Learning? , 2021, ICLR.

[8] Sergey Levine,et al. Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.

[9] Alessandro Lazaric,et al. Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning , 2021, ICLR.

[10] Oleh Rybkin,et al. Discovering and Achieving Goals via World Models , 2021, NeurIPS.

[11] Doina Precup,et al. Reward is enough , 2021, Artif. Intell..

[12] S. Savarese,et al. Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation , 2021, CoRL.

[13] Pieter Abbeel,et al. APS: Active Pretraining with Successor Features , 2021, ICML.

[14] Cordelia Schmid,et al. Goal-Conditioned Reinforcement Learning with Imagined Subgoals , 2021, ICML.

[15] Sergey Levine,et al. Model-Based Reinforcement Learning via Latent-Space Collocation , 2021, ICML.

[16] Sergey Levine,et al. Which Mutual-Information Representation Learning Objectives are Sufficient for Control? , 2021, NeurIPS.

[17] Scott Fujimoto,et al. A Minimalist Approach to Offline Reinforcement Learning , 2021, NeurIPS.

[18] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[19] Phillip Isola,et al. Curious Representation Learning for Embodied Intelligence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Tim G. J. Rudner,et al. Outcome-Driven Reinforcement Learning via Variational Inference , 2021, NeurIPS.

[21] S. Levine,et al. MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[22] Sergey Levine,et al. Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification , 2021, NeurIPS.

[23] Yann Ollivier,et al. Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint , 2021, ArXiv.

[24] Sheila A. McIlraith,et al. Planning from Pixels using Inverse Dynamics Models , 2020, ICLR.

[25] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Sergey Levine,et al. C-Learning: Learning to Achieve Goals via Recursive Classification , 2020, ICLR.

[27] Sergey Levine,et al. Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning , 2020, ICLR.

[28] Pierre Sermanet,et al. Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning , 2020, CoRL.

[29] Pieter Abbeel,et al. Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[30] S. Levine,et al. Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[31] R. Fergus,et al. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[32] Sergey Levine,et al. Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.

[33] Joelle Pineau,et al. Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[34] Sergey Levine,et al. Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning , 2021, ICML.

[35] Nando de Freitas,et al. Semi-supervised reward learning for offline reinforcement learning , 2020, ArXiv.

[36] S. Levine,et al. γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction , 2020, ArXiv.

[37] Andrew Zisserman,et al. Self-supervised Co-training for Video Representation Learning , 2020, NeurIPS.