Contrastive Learning as Goal-Conditioned Reinforcement Learning

In reinforcement learning (RL), it is easier to solve a task if given a good representation. While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable and instead equip RL algorithms with additional representation learning parts (e.g., auxiliary losses, data augmentation). How can we design RL algorithms that directly acquire good representations? In this paper, instead of adding representation learning parts to an existing RL algorithm, we show (contrastive) representation learning methods can be cast as RL algorithms in their own right. To do this, we build upon prior work and apply contrastive representation learning to action-labeled trajectories, in such a way that the (inner product of) learned representations exactly corresponds to a goal-conditioned value function. We use this idea to reinterpret a prior RL method as performing contrastive learning, and then use the idea to propose a much simpler method that achieves similar performance. Across a range of goal-conditioned RL tasks, we demonstrate that contrastive RL methods achieve higher success rates than prior non-contrastive methods, including in the offline RL setting. We also show that contrastive RL outperforms prior methods on image-based tasks, without using data augmentation or auxiliary objectives.

[1]  Zhuoran Yang,et al.  Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning , 2022, ICML.

[2]  D. Schuurmans,et al.  Making Linear MDPs Practical via Contrastive Representation Learning , 2022, ICML.

[3]  Pulkit Agrawal,et al.  Overcoming the Spectral Bias of Neural Value Approximation , 2022, ICLR.

[4]  S. Levine,et al.  Imitating Past Successes can be Very Suboptimal , 2022, NeurIPS.

[5]  Marlos C. Machado,et al.  Investigating the Properties of Neural Network Representations in Reinforcement Learning , 2022, ArXiv.

[6]  P. Abbeel,et al.  CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery , 2022, ArXiv.

[7]  S. Levine,et al.  RvS: What is Essential for Offline RL via Supervised Learning? , 2021, ICLR.

[8]  Sergey Levine,et al.  Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.

[9]  Alessandro Lazaric,et al.  Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning , 2021, ICLR.

[10]  Oleh Rybkin,et al.  Discovering and Achieving Goals via World Models , 2021, NeurIPS.

[11]  Doina Precup,et al.  Reward is enough , 2021, Artif. Intell..

[12]  S. Savarese,et al.  Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation , 2021, CoRL.

[13]  Pieter Abbeel,et al.  APS: Active Pretraining with Successor Features , 2021, ICML.

[14]  Cordelia Schmid,et al.  Goal-Conditioned Reinforcement Learning with Imagined Subgoals , 2021, ICML.

[15]  Sergey Levine,et al.  Model-Based Reinforcement Learning via Latent-Space Collocation , 2021, ICML.

[16]  Sergey Levine,et al.  Which Mutual-Information Representation Learning Objectives are Sufficient for Control? , 2021, NeurIPS.

[17]  Scott Fujimoto,et al.  A Minimalist Approach to Offline Reinforcement Learning , 2021, NeurIPS.

[18]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[19]  Phillip Isola,et al.  Curious Representation Learning for Embodied Intelligence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Tim G. J. Rudner,et al.  Outcome-Driven Reinforcement Learning via Variational Inference , 2021, NeurIPS.

[21]  S. Levine,et al.  MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[22]  Sergey Levine,et al.  Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification , 2021, NeurIPS.

[23]  Yann Ollivier,et al.  Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint , 2021, ArXiv.

[24]  Sheila A. McIlraith,et al.  Planning from Pixels using Inverse Dynamics Models , 2020, ICLR.

[25]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sergey Levine,et al.  C-Learning: Learning to Achieve Goals via Recursive Classification , 2020, ICLR.

[27]  Sergey Levine,et al.  Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning , 2020, ICLR.

[28]  Pierre Sermanet,et al.  Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning , 2020, CoRL.

[29]  Pieter Abbeel,et al.  Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[30]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[31]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[32]  Sergey Levine,et al.  Learning to Reach Goals via Iterated Supervised Learning , 2019, ICLR.

[33]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[34]  Sergey Levine,et al.  Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning , 2021, ICML.

[35]  Nando de Freitas,et al.  Semi-supervised reward learning for offline reinforcement learning , 2020, ArXiv.

[36]  S. Levine,et al.  γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction , 2020, ArXiv.

[37]  Andrew Zisserman,et al.  Self-supervised Co-training for Video Representation Learning , 2020, NeurIPS.

[38]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[39]  Makoto Yamada,et al.  Neural Methods for Point-wise Dependency Estimation , 2020, NeurIPS.

[40]  Sergio Gomez Colmenarejo,et al.  Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[41]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[42]  Daniel Guo,et al.  Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning , 2020, ICML.

[43]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[44]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[45]  Stefano Ermon,et al.  Predictive Coding for Locally-Linear Control , 2020, ICML.

[46]  Pieter Abbeel,et al.  Hallucinative Topological Memory for Zero-Shot Visual Planning , 2020, ICML.

[47]  Sergey Levine,et al.  Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement , 2020, NeurIPS.

[48]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[49]  Pieter Abbeel,et al.  Generalized Hindsight for Reinforcement Learning , 2020, NeurIPS.

[50]  S. Whiteson,et al.  GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values , 2020, ICML.

[51]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[52]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[53]  Sergey Levine,et al.  Learning Predictive Models From Observation and Interaction , 2019, ECCV.

[54]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Misha Denil,et al.  Positive-Unlabeled Reward Learning , 2019, CoRL.

[56]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[57]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[58]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[59]  David Warde-Farley,et al.  Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.

[60]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[61]  Filipe Wall Mutz,et al.  Training Agents using Upside-Down Reinforcement Learning , 2019, ArXiv.

[62]  Sergey Levine,et al.  Planning with Goal-Conditioned Policies , 2019, NeurIPS.

[63]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[64]  Xiaotong Liu,et al.  Policy Continuation with Hindsight Inverse Dynamics , 2019, NeurIPS.

[65]  Doina Precup,et al.  Self-supervised Learning of Distance Functions for Goal-Conditioned Reinforcement Learning , 2019, ArXiv.

[66]  Yoshua Bengio,et al.  Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[67]  Pieter Abbeel,et al.  Goal-conditioned Imitation Learning , 2019, NeurIPS.

[68]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[69]  Rui Zhao,et al.  Maximum Entropy-Regularized Multi-Goal Reinforcement Learning , 2019, ICML.

[70]  Xingyu Lin,et al.  Reinforcement Learning without Ground-Truth State , 2019, ArXiv.

[71]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[72]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[73]  Pieter Abbeel,et al.  Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.

[74]  S. Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[75]  Martin A. Riedmiller,et al.  Self-supervised Learning of Image Embedding for Continuous Control , 2019, ArXiv.

[76]  Tom Schaul,et al.  Universal Successor Features Approximators , 2018, ICLR.

[77]  Marc G. Bellemare,et al.  An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.

[78]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[79]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[80]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[81]  Yifan Wu,et al.  The Laplacian in RL: Learning Representations with Efficient Approximations , 2018, ICLR.

[82]  Katia P. Sycara,et al.  Towards Better Interpretability in Deep Q-Networks , 2018, AAAI.

[83]  Sergey Levine,et al.  SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[84]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[85]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[86]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[87]  Kosuke Imai,et al.  Survey Sampling , 1998, Nov/Dec 2017.

[88]  Sergey Levine,et al.  Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[89]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[90]  Zhuang Ma,et al.  Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency , 2018, EMNLP.

[91]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[92]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[93]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[94]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[95]  Sergey Levine,et al.  Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[96]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.

[97]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[98]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[99]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[100]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[101]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[102]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[103]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[104]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[105]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[106]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[107]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[108]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[109]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[110]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[111]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[112]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[113]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[114]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[115]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[116]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[117]  Marlos C. Machado,et al.  State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.

[118]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[119]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[120]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[121]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[123]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[124]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[125]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[126]  Markus Vincze,et al.  Learning grasps for unknown objects in cluttered scenes , 2013, 2013 IEEE International Conference on Robotics and Automation.

[127]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[128]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[129]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[130]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[131]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[132]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[133]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[134]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[135]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[136]  N. S. Barnett,et al.  Private communication , 1969 .