Importance Weighted Transfer of Samples in Reinforcement Learning

We consider the transfer of experience samples (i.e., tuples ) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus on selecting the most relevant source samples for solving the target task, but then all the transferred samples are used without considering anymore the discrepancies between the task models. In this paper, we propose a model-based technique that automatically estimates the relevance (importance weight) of each source sample for solving the target task. In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight. By extending the results for importance weighting provided in supervised learning literature, we develop a finite-sample analysis of the proposed batch RL algorithm. Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task.

[1]  Csaba Szepesvári,et al.  Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[2]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[3]  Marcello Restelli,et al.  Boosted Fitted Q-Iteration , 2017, ICML.

[4]  Csaba Szepesvari,et al.  Regularization in reinforcement learning , 2011 .

[5]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[8]  Peter Stone,et al.  Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[9]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[10]  Alessandro Lazaric,et al.  Transfer from Multiple MDPs , 2011, NIPS.

[11]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[12]  Romain Laroche,et al.  Transfer Reinforcement Learning with Shared Dynamics , 2017, AAAI.

[13]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[14]  Jochen Garcke,et al.  Importance Weighted Inductive Transfer Learning for Regression , 2014, ECML/PKDD.

[15]  Marcello Restelli,et al.  Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .

[16]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[17]  Finale Doshi-Velez,et al.  Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes , 2017, AAAI.

[18]  Rodolphe Le Riche,et al.  An analytic comparison of regularization methods for Gaussian Processes , 2016, 1602.00853.

[19]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[20]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[21]  Doina Precup,et al.  Value Pursuit Iteration , 2012, NIPS.

[22]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[23]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[24]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[25]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[26]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[27]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[28]  P. Bromiley Products and Convolutions of Gaussian Probability Density Functions , 2013 .