Direct Policy Transfer via Hidden Parameter Markov Decision Processes

Many situations arise in which an agent must learn to solve tasks with similar, but not identical, dynamics. For example, a robot might be tasked to manipulate objects with slightly different masses and volumes; a clinician may treat patients with unique—but still all human adult—physiologies. In such situations, if one has already seen several instances of the related tasks, it is inefficient to start learning a new task from scratch. Indeed, in some domains, such as medicine, one does not have multiple episodes to learn a personalized treatment policy: decisions must be optimized from only a few interactions with the patient.

[1]  Louis Wehenkel,et al.  Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[2]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[3]  David Hsu,et al.  Planning how to learn , 2013, 2013 IEEE International Conference on Robotics and Automation.

[4]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[5]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[7]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[8]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[9]  Mykel J. Kochenderfer,et al.  Simultaneous policy learning and latent state inference for imitating driver behavior , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[10]  Finale Doshi-Velez,et al.  Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes , 2017, AAAI.

[11]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[12]  Regina Barzilay,et al.  Deep Transfer in Reinforcement Learning by Language Grounding , 2017, ArXiv.

[13]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[14]  Joelle Pineau,et al.  Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.