AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the environment model, we characterize a minimal set of representations, including both domain-specific factors and domain-shared state representations, that suffice for reliable and low-cost transfer. Moreover, we show that by explicitly leveraging a compact representation to encode changes, we can adapt the policy with only a few samples without further policy optimization in the target domain. We illustrate the efficacy of AdaRL through a series of experiments that allow for changes in different components of Cartpole and Atari games.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[3]  Jiayu Zhou,et al.  Transfer Learning in Deep Reinforcement Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  Kun Zhang,et al.  Multi-domain Causal Structure Learning in Linear Systems , 2018, NeurIPS.

[7]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[8]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[9]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[10]  Jiji Zhang,et al.  Intervention, determinism, and the causal minimality condition , 2011, Synthese.

[11]  Christoph H. Lampert,et al.  A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[12]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[15]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[16]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[17]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[18]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[19]  Joelle Pineau,et al.  Model-Invariant State Abstractions for Model-Based Reinforcement Learning , 2021, ArXiv.

[20]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[21]  Bernhard Schölkopf,et al.  Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[22]  Marcello Restelli,et al.  Importance Weighted Transfer of Samples in Reinforcement Learning , 2018, ICML.

[23]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[24]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[25]  Yoshua Bengio,et al.  Towards Causal Representation Learning , 2021, ArXiv.

[26]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[29]  Joelle Pineau,et al.  Invariant Causal Prediction for Block MDPs , 2020, ICML.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Marcello Restelli,et al.  Transfer of Samples in Policy Search via Multiple Importance Sampling , 2019, ICML.

[32]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[33]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[35]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[36]  Robert L. Shook The book of why , 1983 .

[37]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[38]  Ron Meir,et al.  Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[39]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[40]  Bernhard Scholkopf Causality for Machine Learning , 2019 .

[41]  S. Levine,et al.  Guided Meta-Policy Search , 2019, NeurIPS.

[42]  Shie Mannor,et al.  Optimizing the CVaR via Sampling , 2014, AAAI.

[43]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[44]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[45]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[46]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[47]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[48]  Joelle Pineau,et al.  Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[49]  Rowan McAllister,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[50]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[51]  Yoav Goldberg,et al.  Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation , 2018, ICML.

[52]  Kun Zhang,et al.  Domain Adaptation As a Problem of Inference on Graphical Models , 2020, NeurIPS.

[53]  S. Srihari Mixture Density Networks , 1994 .