AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

One practical challenge in reinforcement learning (RL) is how to make quick adaptations when faced with new environments. In this paper, we propose a principled framework for adaptive RL, called \textit{AdaRL}, that adapts reliably and efficiently to changes across domains with a few samples from the target domain, even in partially observable environments. Specifically, we leverage a parsimonious graphical representation that characterizes structural relationships over variables in the RL system. Such graphical representations provide a compact way to encode what and where the changes across domains are, and furthermore inform us with a minimal set of changes that one has to consider for the purpose of policy adaptation. We show that by explicitly leveraging this compact representation to encode changes, we can efficiently adapt the policy to the target domain, in which only a few samples are needed and further policy optimization is avoided. We illustrate the efficacy of AdaRL through a series of experiments that vary factors in the observation, transition, and reward functions for Cartpole and Atari games.

[1]  Marc G. Bellemare,et al.  Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.

[2]  Nan Rosemary Ke,et al.  Toward Causal Representation Learning , 2021, Proceedings of the IEEE.

[3]  Yoshua Bengio,et al.  Towards Causal Representation Learning , 2021, ArXiv.

[4]  Matthew E. Taylor,et al.  Model-Invariant State Abstractions for Model-Based Reinforcement Learning , 2021, ArXiv.

[5]  Marlos C. Machado,et al.  Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning , 2021, ICLR.

[6]  Jiayu Zhou,et al.  Transfer Learning in Deep Reinforcement Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[8]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[9]  S. Levine,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, Robotics: Science and Systems.

[10]  Doina Precup,et al.  Invariant Causal Prediction for Block MDPs , 2020, ICML.

[11]  C. Glymour,et al.  Domain Adaptation As a Problem of Inference on Graphical Models , 2020, NeurIPS.

[12]  S. Levine,et al.  Meta-Learning without Memorization , 2019, ICLR.

[13]  B. Schölkopf,et al.  Causality for Machine Learning , 2019, Probabilistic and Causal Inference.

[14]  Alex Smola,et al.  Meta-Q-Learning , 2019, ICLR.

[15]  Marcello Restelli,et al.  Transfer of Samples in Policy Search via Multiple Importance Sampling , 2019, ICML.

[16]  S. Levine,et al.  Guided Meta-Policy Search , 2019, NeurIPS.

[17]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[18]  Bernhard Schölkopf,et al.  Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[19]  Kun Zhang,et al.  Multi-domain Causal Structure Learning in Linear Systems , 2018, NeurIPS.

[20]  Frans A. Oliehoek,et al.  Bayesian Reinforcement Learning in Factored POMDPs , 2018, AAMAS.

[21]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[22]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[23]  J. Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[24]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[25]  Yoav Goldberg,et al.  Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation , 2018, ICML.

[26]  Marcello Restelli,et al.  Importance Weighted Transfer of Samples in Reinforcement Learning , 2018, ICML.

[27]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[28]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[29]  Ron Meir,et al.  Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[30]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[31]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[32]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[33]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[34]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[35]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[36]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[39]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[40]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[41]  Marc G. Bellemare,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[42]  Michael I. Jordan,et al.  Trust Region Policy Optimization , 2015, ICML.

[43]  Shie Mannor,et al.  Optimizing the CVaR via Sampling , 2014, AAAI.

[44]  Diederik P. Kingma,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[46]  Christoph H. Lampert,et al.  A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[47]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[48]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[50]  Jiji Zhang,et al.  Intervention, determinism, and the causal minimality condition , 2011, Synthese.

[51]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[52]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[53]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[54]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[55]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[56]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[57]  C. Borror Practical Nonparametric Statistics, 3rd Ed. , 2001 .

[58]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[59]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[60]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[61]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[62]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[63]  Robert L. Shook,et al.  The Book of Why , 1983, Journal of MultiDisciplinary Evaluation.

[64]  Joelle Pineau,et al.  Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[65]  J. Schulman,et al.  RL: FAST REINFORCEMENT LEARNING VIA SLOW REINFORCEMENT LEARNING , 2016 .

[66]  Alon Gonen Understanding Machine Learning From Theory to Algorithms 1st Edition Shwartz Solutions Manual , 2015 .

[67]  D. Hinkley Bootstrap Methods: Another Look at the Jackknife , 2008 .

[68]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[69]  Geoffrey E. Hinton,et al.  Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks , 2006 .

[70]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[71]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[72]  S. Srihari Mixture Density Networks , 1994 .

[73]  Causality : Models , Reasoning , and Inference , 2022 .