论文信息 - AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning - 字舞流文

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the environment model, we characterize a minimal set of representations, including both domain-specific factors and domain-shared state representations, that suffice for reliable and low-cost transfer. Moreover, we show that by explicitly leveraging a compact representation to encode changes, we can adapt the policy with only a few samples without further policy optimization in the target domain. We illustrate the efficacy of AdaRL through a series of experiments that allow for changes in different components of Cartpole and Atari games.

Sara Magliacane | Chaochao Lu | Fan Feng | Biwei Huang | Kun Zhang

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[3] Jiayu Zhou,et al. Transfer Learning in Deep Reinforcement Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[5] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[6] Kun Zhang,et al. Multi-domain Causal Structure Learning in Linear Systems , 2018, NeurIPS.

[7] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[8] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[9] Katja Hofmann,et al. Fast Context Adaptation via Meta-Learning , 2018, ICML.

[10] Jiji Zhang,et al. Intervention, determinism, and the causal minimality condition , 2011, Synthese.

[11] Christoph H. Lampert,et al. A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[12] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14] Javier García,et al. Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[15] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[16] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .

[17] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[18] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[19] Joelle Pineau,et al. Model-Invariant State Abstractions for Model-Based Reinforcement Learning , 2021, ArXiv.

[20] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[21] Bernhard Schölkopf,et al. Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[22] Marcello Restelli,et al. Importance Weighted Transfer of Samples in Reinforcement Learning , 2018, ICML.

[23] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[24] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[25] Yoshua Bengio,et al. Towards Causal Representation Learning , 2021, ArXiv.

[26] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[27] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[29] Joelle Pineau,et al. Invariant Causal Prediction for Block MDPs , 2020, ICML.

[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[31] Marcello Restelli,et al. Transfer of Samples in Policy Search via Multiple Importance Sampling , 2019, ICML.

[32] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[33] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34] Yoshua Bengio,et al. Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[35] Peter Stone,et al. Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[36] Robert L. Shook. The book of why , 1983 .

[37] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[38] Ron Meir,et al. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory , 2017, ICML.

[39] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[40] Bernhard Scholkopf. Causality for Machine Learning , 2019 .

[41] S. Levine,et al. Guided Meta-Policy Search , 2019, NeurIPS.

[42] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.

[43] Tom Burr,et al. Causation, Prediction, and Search , 2003, Technometrics.

[44] Bernhard Schölkopf,et al. Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[45] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[46] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.

[47] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[48] Joelle Pineau,et al. Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[49] Rowan McAllister,et al. Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[50] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.

[51] Yoav Goldberg,et al. Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation , 2018, ICML.

[52] Kun Zhang,et al. Domain Adaptation As a Problem of Inference on Graphical Models , 2020, NeurIPS.

[53] S. Srihari. Mixture Density Networks , 1994 .