Using Bisimulation for Policy Transfer in MDPs

Knowledge transfer has been suggested as a useful approach for solving large Markov Decision Processes. The main idea is to compute a decision-making policy in one environment and use it in a different environment, provided the two are "close enough". In this paper, we use bisimulation-style metrics (Ferns et al., 2004) to guide knowledge transfer. We propose algorithms that decide what actions to transfer from the policy computed on a small MDP task to a large task, given the bisimulation distance between states in the two tasks. We demonstrate the inherent "pessimism" of bisimulation metrics and present variants of this metric aimed to overcome this pessimism, leading to improved action transfer. We also show that using this approach for transferring temporally extended actions (Sutton et al., 1999) is more successful than using it exclusively with primitive actions. We present theoretical guarantees on the quality of the transferred policy, as well as promising empirical results.

[1]  Eliseo Ferrante,et al.  Transfer of task representation in reinforcement learning using policy-based proto-value functions , 2008, AAMAS.

[2]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[3]  James Worrell,et al.  An Algorithm for Quantitative Verification of Probabilistic Transition Systems , 2001, CONCUR.

[4]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[5]  Doina Precup,et al.  Knowledge Transfer in Markov Decision Processes , 2006 .

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  M. Veloso,et al.  Bounding the suboptimality of reusing subproblems , 1999, IJCAI 1999.

[8]  Balaraman Ravindran,et al.  Relativized Options: Choosing the Right Transformation , 2003, ICML.

[9]  F. Sunmola Model Transfer for Markov Decision Tasks via Parameter Matching , 2006 .

[10]  Doina Precup,et al.  Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .

[11]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[12]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[13]  Satinder P. Singh,et al.  Transfer via soft homomorphisms , 2009, AAMAS.

[14]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[15]  Alicia P. Wolfe Defining Object Types and Options Using MDP Homomorphisms , 2006 .

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[18]  Doina Precup,et al.  Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.

[19]  Doina Precup,et al.  Using bisimulation for policy transfer in MDPs (Extended Abstract) , 2010 .

[20]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[21]  Antonio Frangioni,et al.  A Computational Study of Cost Reoptimization for Min-Cost Flow Problems , 2006, INFORMS J. Comput..

[22]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[23]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[24]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[25]  Doina Precup,et al.  Notions of State Equivalence under Partial Observability , 2009 .

[26]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.