Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.

[1]  Vincent Michalski,et al.  Learning Robust Dynamics through Variational Sparse Gating , 2022, NeurIPS.

[2]  M. P. Kumar,et al.  In Defense of the Unitary Scalarization for Deep Multi-Task Learning , 2022, NeurIPS.

[3]  Sergey Levine,et al.  Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.

[4]  Michael A. Osborne,et al.  Revisiting Design Choices in Offline Model Based Reinforcement Learning , 2021, ICLR.

[5]  Jonathan Tompson,et al.  Implicit Behavioral Cloning , 2021, CoRL.

[6]  Silvio Savarese,et al.  What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , 2021, CoRL.

[7]  Alessandro Lazaric,et al.  Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning , 2021, ICLR.

[8]  Stefano Ermon,et al.  Temporal Predictive Coding For Model-Based Planning In Latent Space , 2021, ICML.

[9]  Scott Fujimoto,et al.  A Minimalist Approach to Offline Reinforcement Learning , 2021, NeurIPS.

[10]  Sergey Levine,et al.  Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills , 2021, ICML.

[11]  Stephen Roberts,et al.  Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment , 2021, ICML.

[12]  Rob Fergus,et al.  Decoupling Value and Policy for Generalization in Reinforcement Learning , 2021, ICML.

[13]  Sergey Levine,et al.  COMBO: Conservative Offline Model-Based Policy Optimization , 2021, NeurIPS.

[14]  Rico Jonschkowski,et al.  The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels , 2021, ArXiv.

[15]  Chelsea Finn,et al.  Offline Reinforcement Learning from Images with Latent Space Models , 2020, L4DC.

[16]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[17]  Gabriel Dulac-Arnold,et al.  Model-Based Offline Planning , 2020, ICLR.

[18]  T. Taniguchi,et al.  Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Matthias Bethge,et al.  Improving robustness against common corruptions by covariate shift adaptation , 2020, NeurIPS.

[20]  Yuval Tassa,et al.  dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[21]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[22]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[23]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[24]  Lantao Yu,et al.  MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[25]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[26]  T. Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[27]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[28]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[29]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[30]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[31]  Xingyou Song,et al.  Observational Overfitting in Reinforcement Learning , 2019, ICLR.

[32]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[33]  Oleg O. Sushkov,et al.  Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.

[34]  Yifan Wu,et al.  Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.

[35]  Rishabh Agarwal,et al.  An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.

[36]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[37]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[38]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[39]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[40]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[41]  Yee Whye Teh,et al.  Disentangling Disentanglement in Variational Autoencoders , 2018, ICML.

[42]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[43]  Yiting Xie,et al.  Pre-training on Grayscale ImageNet Improves Medical Image Classification , 2018, ECCV Workshops.

[44]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[45]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[46]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[48]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[49]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[50]  Finale Doshi-Velez,et al.  Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes , 2017, AAAI.

[51]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[52]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[53]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[54]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[55]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[56]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[57]  Ivan Bratko,et al.  Behavioural Cloning: Phenomena, Results and Problems , 1995 .

[58]  S. Levine,et al.  Should I Run Offline Reinforcement Learning or Behavioral Cloning? , 2022, ICLR.

[59]  Joelle Pineau,et al.  Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[60]  Y. Gal,et al.  VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning , 2021, J. Mach. Learn. Res..

[61]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[62]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[63]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[64]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).