Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, to date, offline reinforcement learning from visual observations has been relatively under-explored, and there is a lack of understanding of where the remaining challenges lie. In this paper, we seek to establish simple baselines for continuous control in the visual domain. We show that simple modifications to two state-of-the-art vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform prior work and establish a competitive baseline. We rigorously evaluate these algorithms on both existing offline datasets and a new testbed for offline reinforcement learning from visual observations that better represents the data distributions present in real-world offline reinforcement learning problems, and open-source our code and data to facilitate progress in this important domain. Finally, we present and analyze several key desiderata unique to offline RL from visual observations, including visual distractions and visually identifiable changes in dynamics. [19, 20], and predicts ahead using compact model latent states. The particular instantiation used in DreamerV2 uses model states s t containing a deterministic component h t , implemented as the recurrent state of a Gated Recurrent Unit (GRU, [9]), and a stochastic component z t with categorical distribution. The actor and critic are trained from imagined trajectories of latent states, starting at encoded states of previously encountered sequences.

[1]  Shimon Whiteson,et al.  In Defense of the Unitary Scalarization for Deep Multi-Task Learning , 2022 .

[2]  Sergey Levine,et al.  Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.

[3]  Michael A. Osborne,et al.  Revisiting Design Choices in Offline Model Based Reinforcement Learning , 2021, ICLR.

[4]  Alessandro Lazaric,et al.  Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning , 2021, ICLR.

[5]  S. Levine,et al.  Should I Run Offline Reinforcement Learning or Behavioral Cloning? , 2022, ICLR.

[6]  Jonathan Tompson,et al.  Implicit Behavioral Cloning , 2021, CoRL.

[7]  Silvio Savarese,et al.  What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , 2021, CoRL.

[8]  Stefano Ermon,et al.  Temporal Predictive Coding For Model-Based Planning In Latent Space , 2021, ICML.

[9]  Scott Fujimoto,et al.  A Minimalist Approach to Offline Reinforcement Learning , 2021, NeurIPS.

[10]  Sergey Levine,et al.  Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills , 2021, ICML.

[11]  Stephen Roberts,et al.  Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment , 2021, ICML.

[12]  Rob Fergus,et al.  Decoupling Value and Policy for Generalization in Reinforcement Learning , 2021, ICML.

[13]  Sergey Levine,et al.  COMBO: Conservative Offline Model-Based Policy Optimization , 2021, NeurIPS.

[14]  Rico Jonschkowski,et al.  The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels , 2021, ArXiv.

[15]  Chelsea Finn,et al.  Offline Reinforcement Learning from Images with Latent Space Models , 2020, L4DC.

[16]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[17]  Gabriel Dulac-Arnold,et al.  Model-Based Offline Planning , 2020, ICLR.

[18]  T. Taniguchi,et al.  Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[19]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[20]  Joelle Pineau,et al.  Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[21]  Y. Gal,et al.  VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning , 2021, J. Mach. Learn. Res..

[22]  Matthias Bethge,et al.  Improving robustness against common corruptions by covariate shift adaptation , 2020, NeurIPS.

[23]  Yuval Tassa,et al.  dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[24]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[25]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[26]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[27]  Lantao Yu,et al.  MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[28]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[29]  T. Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[30]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[31]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[32]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[33]  Xingyou Song,et al.  Observational Overfitting in Reinforcement Learning , 2019, ICLR.

[34]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[35]  Oleg O. Sushkov,et al.  Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.

[36]  Rishabh Agarwal,et al.  An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.

[37]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[38]  Yifan Wu,et al.  Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.

[39]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[40]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[41]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[42]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[43]  Yee Whye Teh,et al.  Disentangling Disentanglement in Variational Autoencoders , 2018, ICML.

[44]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[45]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[46]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[47]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[48]  Yiting Xie,et al.  Pre-training on Grayscale ImageNet Improves Medical Image Classification , 2018, ECCV Workshops.

[49]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[50]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[51]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[52]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[53]  Finale Doshi-Velez,et al.  Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes , 2017, AAAI.

[54]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[55]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[56]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[57]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[58]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[59]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[60]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[61]  Ivan Bratko,et al.  Behavioural Cloning: Phenomena, Results and Problems , 1995 .

[62]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[63]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).