Contextualize Me - The Case for Context in Reinforcement Learning

While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Our goal is to show how the framework of cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks. We confirm the insight that optimal behavior in cRL requires context information, as in other related areas of partial observability. To empirically validate this in the cRL framework, we provide various context-extended versions of common RL environments. They are part of the first benchmark library, CARL, designed for generalization based on cRL extensions of popular benchmarks, which we propose as a testbed to further study general agents. We show that in the contextual setting, even simple RL environments become challenging - and that naive solutions are not enough to generalize across complex context spaces.

[1]  Cathy Wu,et al.  The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning , 2022, NeurIPS.

[2]  Noor H. Awad,et al.  Automated Dynamic Algorithm Configuration , 2022, J. Artif. Intell. Res..

[3]  B. Rosenhahn,et al.  POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning , 2022, Trans. Mach. Learn. Res..

[4]  Martin A. Riedmiller,et al.  Magnetic control of tokamak plasmas through deep reinforcement learning , 2022, Nature.

[5]  Weinan Zhang,et al.  Goal-Conditioned Reinforcement Learning: Problems and Solutions , 2022, IJCAI.

[6]  F. Hutter,et al.  Automated Reinforcement Learning (AutoRL): A Survey and Open Problems , 2022, J. Artif. Intell. Res..

[7]  H. Kurniawati Partially Observable Markov Decision Processes and Robotics , 2022, Annu. Rev. Control. Robotics Auton. Syst..

[8]  Edward Grefenstette,et al.  A Survey of Zero-shot Generalisation in Deep Reinforcement Learning , 2021, J. Artif. Intell. Res..

[9]  Pieter Abbeel,et al.  URLB: Unsupervised Reinforcement Learning Benchmark , 2021, NeurIPS Datasets and Benchmarks.

[10]  Fabio Petroni,et al.  MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research , 2021, NeurIPS Datasets and Benchmarks.

[11]  Marc G. Bellemare,et al.  Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.

[12]  Thommen George Karimpanal,et al.  A New Representation of Successor Features for Transfer across Dissimilar Environments , 2021, ICML.

[13]  Sergey Levine,et al.  Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability , 2021, NeurIPS.

[14]  Pulkit Agrawal,et al.  Learning Task Informed Abstractions , 2021, ICML.

[15]  Olivier Bachem,et al.  Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation , 2021, NeurIPS Datasets and Benchmarks.

[16]  Marius Lindauer,et al.  Self-Paced Context Evaluation for Contextual Reinforcement Learning , 2021, ICML.

[17]  Ana L. C. Bazzan,et al.  Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection , 2021, AAMAS.

[18]  Bodo Rosenhahn,et al.  TOAD-GAN: A Flexible Framework for Few-Shot Level Generation in Token-Based Games , 2021, IEEE Transactions on Games.

[19]  Cédric Buche,et al.  Robots Learn Increasingly Complex Tasks with Intrinsic Motivation and Automatic Curriculum Learning , 2021, KI - Künstliche Intelligenz.

[20]  Frank Hutter,et al.  On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning , 2021, AISTATS.

[21]  Alessandro Lazaric,et al.  Reinforcement Learning with Prototypical Representations , 2021, ICML.

[22]  Joelle Pineau,et al.  Multi-Task Reinforcement Learning with Context-based Representations , 2021, ICML.

[23]  Zeb Kurth-Nelson,et al.  Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents , 2021, NeurIPS Datasets and Benchmarks.

[24]  Huan Zhang,et al.  Robust Reinforcement Learning on State Observations with Learned Optimal Adversary , 2021, ICLR.

[25]  Dhruv Malik,et al.  When Is Generalizable Reinforcement Learning Tractable? , 2021, NeurIPS.

[26]  Sameera S. Ponda,et al.  Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.

[27]  Jan Peters,et al.  High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards , 2020, CoRL.

[28]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[29]  Jianye Hao,et al.  Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning , 2020, AAAI.

[30]  Jiayu Zhou,et al.  Transfer Learning in Deep Reinforcement Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Frank Hutter,et al.  Sample-Efficient Automated Deep Reinforcement Learning , 2020, ICLR.

[32]  Bodo Rosenhahn,et al.  TOAD-GAN: Coherent Style Level Generation from a Single Example , 2020, AIIDE.

[33]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[34]  Edward Grefenstette,et al.  The NetHack Learning Environment , 2020, NeurIPS.

[35]  Jinwoo Shin,et al.  Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning , 2020, ICML.

[36]  Finale Doshi-Velez,et al.  Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients , 2020, AMIA.

[37]  Jan Peters,et al.  Self-Paced Deep Reinforcement Learning , 2020, NeurIPS.

[38]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[39]  Stephen Roberts,et al.  Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits , 2020, NeurIPS.

[40]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[41]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[42]  Abhinav Gupta,et al.  Environment Probing Interaction Policies , 2019, ICLR.

[43]  Matloob Khushi,et al.  Reinforcement Learning in Financial Markets , 2019, Data.

[44]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[45]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[46]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[47]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[48]  Tie-Yan Liu,et al.  A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network , 2019, AAMAS.

[49]  J. Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[50]  Nan Jiang,et al.  Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.

[51]  Rob Fergus,et al.  Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning , 2018, ArXiv.

[52]  Frank Hutter,et al.  Learning to Design RNA , 2018, ICLR.

[53]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[54]  Marius Lindauer,et al.  CAVE: Configuration Assessment, Visualization and Evaluation , 2018, LION.

[55]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[56]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[57]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[58]  Richard N. Zare,et al.  Optimizing Chemical Reactions with Deep Reinforcement Learning , 2017, ACS central science.

[59]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[60]  Nan Jiang,et al.  Markov Decision Processes with Continuous Side Information , 2017, ALT.

[61]  Jan N. van Rijn,et al.  Hyperparameter Importance Across Datasets , 2017, KDD.

[62]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[63]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[64]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[65]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[66]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[67]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[68]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[69]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[70]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[71]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[72]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[73]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[74]  J. Schulman,et al.  OpenAI Gym , 2016, ArXiv.

[75]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[76]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[77]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[78]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[79]  Shie Mannor,et al.  Contextual Markov Decision Processes , 2015, ArXiv.

[80]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[81]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[82]  Jan Peters,et al.  Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .

[83]  Bernd Bischl,et al.  Exploratory landscape analysis , 2011, GECCO '11.

[84]  Shimon Whiteson,et al.  Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[85]  H. Hoos,et al.  Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection , 2010, AAAI.

[86]  T. Urbanik,et al.  Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[87]  Shimon Whiteson,et al.  Neuroevolutionary reinforcement learning for generalized helicopter control , 2009, GECCO.

[88]  Matthew E. Taylor,et al.  Abstraction and Generalization in Reinforcement Learning: A Summary and Framework , 2009, ALA.

[89]  Paulo Martins Engel,et al.  Dealing with non-stationary environments using context detection , 2006, ICML.

[90]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[91]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[92]  T. Dean,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[93]  R. Geoff Dromey,et al.  An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[94]  Edward Grefenstette,et al.  A Survey of Generalisation in Deep Reinforcement Learning , 2021, ArXiv.

[95]  Joelle Pineau,et al.  Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[96]  Zeb Kurth-Nelson,et al.  Alchemy: A structured task distribution for meta-reinforcement learning , 2021, ArXiv.

[97]  Prakash Panangaden,et al.  MICo: Learning improved representations via sampling-based state similarity for Markov decision processes , 2021, ArXiv.

[98]  Philip S. Thomas,et al.  High Confidence Generalization for Reinforcement Learning , 2021, ICML.