Curious Replay for Model-based Adaptation

Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay -- a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. Code for Curious Replay is available at https://github.com/AutonomousAgentsLab/curiousreplay

[1]  D. Bendor,et al.  The role of experience in prioritizing hippocampal replay , 2023, bioRxiv.

[2]  Feryal M. P. Behbahani,et al.  Human-Timescale Adaptation in an Open-Ended Task Space , 2023, ICML.

[3]  Jonathan C. Balloch,et al.  Neuro-Symbolic World Models for Adapting to Open World Novelty , 2023, AAMAS.

[4]  Jimmy Ba,et al.  Mastering Diverse Domains through World Models , 2023, ArXiv.

[5]  Eloi Alonso,et al.  Transformers are Sample Efficient World Models , 2022, ICLR.

[6]  P. Abbeel,et al.  DayDreamer: World Models for Physical Robot Learning , 2022, CoRL.

[7]  R. Munos,et al.  BYOL-Explore: Exploration by Bootstrapped Prediction , 2022, NeurIPS.

[8]  Ian S. Fischer,et al.  Deep Hierarchical Planning from Pixels , 2022, NeurIPS.

[9]  H. V. Seijen,et al.  Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods , 2022, ICML.

[10]  Jaesik Yoon,et al.  TransDreamer: Reinforcement Learning with Transformer World Models , 2022, ArXiv.

[11]  Tatsunori B. Hashimoto,et al.  Extending the WILDS Benchmark for Unsupervised Adaptation , 2021, ICLR.

[12]  Deepak Pathak,et al.  Interesting Object, Curious Agent: Learning Task-Agnostic Exploration , 2021, NeurIPS.

[13]  Pieter Abbeel,et al.  URLB: Unsupervised Reinforcement Learning Benchmark , 2021, NeurIPS Datasets and Benchmarks.

[14]  Ingook Jang,et al.  DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations , 2021, ICML.

[15]  Oleh Rybkin,et al.  Discovering and Achieving Goals via World Models , 2021, NeurIPS.

[16]  Danijar Hafner Benchmarking the Spectrum of Agent Capabilities , 2021, ICLR.

[17]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[18]  Yang Yu,et al.  Regret Minimization Experience Replay in Off-Policy Reinforcement Learning , 2021, NeurIPS.

[19]  Alessandro Lazaric,et al.  Reinforcement Learning with Prototypical Representations , 2021, ICML.

[20]  Rico Jonschkowski,et al.  The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels , 2021, ArXiv.

[21]  Doina Precup,et al.  Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, J. Artif. Intell. Res..

[22]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[23]  Karol Hausman,et al.  A Geometric Perspective on Self-Supervised Policy Adaptation , 2020, ArXiv.

[24]  Simone Calderara,et al.  Rethinking Experience Replay: a Bag of Tricks for Continual Learning , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[25]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[26]  Daniel Yamins,et al.  Active World Model Learning with Progress Curiosity , 2020, ICML.

[27]  Jinwoo Shin,et al.  Learning to Sample with Local and Global Contexts in Experience Replay Buffer , 2020, ICLR.

[28]  Alexei A. Efros,et al.  Self-Supervised Policy Adaptation during Deployment , 2020, ICLR.

[29]  Harm van Seijen,et al.  The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning , 2020, NeurIPS.

[30]  Erin J. Talvitie,et al.  Selective Dyna-style Planning Under Limited Model Capacity , 2020, ICML.

[31]  Stefano Ermon,et al.  Experience Replay with Likelihood-free Importance Weights , 2020, L4DC.

[32]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[33]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[34]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[35]  S. Levine,et al.  Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning , 2020, CoRL.

[36]  Peiquan Sun,et al.  Attentive Experience Replay , 2020, AAAI.

[37]  Sergey Levine,et al.  DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.

[38]  Chelsea Finn,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Tim Rocktäschel,et al.  RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[40]  Amir-massoud Farahmand,et al.  Frequency-based Search-control in Dyna , 2020, ICLR.

[41]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[42]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[43]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2019, IEEE Robotics and Automation Letters.

[44]  Marco Pavone,et al.  Continuous Meta-Learning without Tasks , 2019, NeurIPS.

[45]  Johannes L. Schönberger,et al.  SciPy 1.0: fundamental algorithms for scientific computing in Python , 2019, Nature Methods.

[46]  Daochen Zha,et al.  Experience Replay Optimization , 2019, IJCAI.

[47]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[48]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[49]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[50]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[51]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[52]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[53]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[54]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[55]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[56]  J. Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[57]  Marlos C. Machado,et al.  Count-Based Exploration with the Successor Representation , 2018, AAAI.

[58]  Petros Koumoutsakos,et al.  Remember and Forget for Experience Replay , 2018, ICML.

[59]  Zeb Kurth-Nelson,et al.  Been There, Done That: Meta-Learning with Episodic Recall , 2018, ICML.

[60]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[61]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[62]  Dan Horgan,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[63]  P. Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[64]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[65]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[67]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[68]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[69]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[70]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[71]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[72]  Jonathan P. How,et al.  Quickest change detection approach to optimal control in Markov decision processes with model changes , 2016, 2017 American Control Conference (ACC).

[73]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[75]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[76]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[77]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[78]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[79]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[80]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[81]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[82]  P. Fearnhead,et al.  On‐line inference for multiple changepoint problems , 2007 .

[83]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[84]  Hod Lipson,et al.  Resilient Machines Through Continuous Self-Modeling , 2006, Science.

[85]  Paulo Martins Engel,et al.  Dealing with non-stationary environments using context detection , 2006, ICML.

[86]  Chrystopher L. Nehaniv,et al.  All Else Being Equal Be Empowered , 2005, ECAL.

[87]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[88]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[89]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[90]  Marc G. Bellemare,et al.  Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier , 2023, ICLR.

[91]  Jinwoo Shin,et al.  Model-augmented Prioritized Experience Replay , 2022, ICLR.

[92]  Brad Burega Learning to Prioritize Planning Updates in Model-based Reinforcement Learning , 2022 .

[93]  Chelsea Finn,et al.  Deep Reinforcement Learning amidst Continual Structured Non-Stationarity , 2021, ICML.

[94]  Sergey Levine,et al.  SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments , 2021, ICLR.

[95]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .

[96]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[97]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[98]  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 Intrinsic Motivation Systems for Autonomous Mental Development , 2022 .