Reinforcement learning transfer based on subgoal discovery and subtask similarity

This paper studies the problem of transfer learning in the context of reinforcement learning. We propose a novel transfer learning method that can speed up reinforcement learning with the aid of previously learnt tasks. Before performing extensive learning episodes, our method attempts to analyze the learning task via some exploration in the environment, and then attempts to reuse previous learning experience whenever it is possible and appropriate. In particular, our proposed method consists of four stages: 1) subgoal discovery, 2) option construction, 3) similarity searching, and 4) option reusing. Especially, in order to fulfill the task of identifying similar options, we propose a novel similarity measure between options, which is built upon the intuition that similar options have similar state-action probabilities. We examine our algorithm using extensive experiments, comparing it with existing methods. The results show that our method outperforms conventional non-transfer reinforcement learning algorithms, as well as existing transfer learning methods, by a wide margin.

[1]  Alicia P. Wolfe,et al.  Decision Tree Methods for Finding Reusable MDP Homomorphisms , 2006, AAAI.

[2]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[3]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[4]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[5]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[6]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[7]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[8]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[9]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[10]  Yang Gao,et al.  Connect-Based Subgoal Discovery for Options in Hierarchical Reinforcement Learning , 2007, Third International Conference on Natural Computation (ICNC 2007).

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[13]  Manfred Huber,et al.  Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies , 2003 .

[14]  Xue-Song Wang,et al.  A Hybrid Transfer Algorithm for Reinforcement Learning Based on Spectral Method , 2012 .

[15]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[16]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[17]  Peter Stone,et al.  An Introduction to Intertask Transfer for Reinforcement Learning , 2011, AI Mag..

[18]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[19]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[20]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.