CURL: Contrastive Unsupervised Representations for Reinforcement Learning

We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1.9x and 1.2x performance gains at the 100K environment and interaction steps benchmarks respectively. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features. Our code is open-sourced and available at this https URL.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[5]  Kacper Kielak Do recent advancements in model-based deep reinforcement learning really improve data efficiency? , 2019 .

[6]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[7]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[8]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[9]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[12]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[13]  Yoshua Bengio,et al.  Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[14]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[15]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.

[16]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[17]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[18]  Ilya Kostrikov,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[19]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[20]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[21]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[22]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[23]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[24]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[27]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[28]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  Pieter Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[31]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[33]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[35]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[36]  Sergey Levine,et al.  Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[37]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[38]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[39]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[42]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[43]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[44]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[45]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[46]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[47]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[49]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[50]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[51]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[52]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[53]  Bernhard Schölkopf,et al.  From Variational to Deterministic Autoencoders , 2019, ICLR.

[54]  Dahua Lin,et al.  Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination , 2018, ArXiv.

[55]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[56]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Jonathan Tompson,et al.  Learning Actionable Representations from Visual Observations , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[58]  Matteo Hessel,et al.  When to use parametric models in reinforcement learning? , 2019, NeurIPS.

[59]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[60]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[61]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.