Latent Structure Matching for Knowledge Transfer in Reinforcement Learning

Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the learning of target tasks. However, if an unmatched source task is selected, it will slow down or even disrupt the learning procedure. Therefore, it is very important for knowledge transfer to select appropriate source tasks that have a high degree of matching with target tasks. In this paper, a novel task matching algorithm is proposed to derive the latent structures of value functions of tasks, and align the structures for similarity estimation. Through the latent structure matching, the highly-matched source tasks are selected effectively, from which knowledge is then transferred to give action advice, and improve exploration strategies of the target tasks. Experiments are conducted on the simulated navigation environment and the mountain car environment. The results illustrate the significant performance gain of the improved exploration strategy, compared with traditional ϵ -greedy exploration strategy. A theoretical proof is also given to verify the improvement of the exploration strategy based on latent structure matching.

[1]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[2]  Hans J. Briegel,et al.  Benchmarking Projective Simulation in Navigation Problems , 2018, IEEE Access.

[3]  Jacek M. Zurada,et al.  Self-Organizing Neural Networks Integrating Domain Knowledge and Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[5]  Ioannis P. Vlahavas,et al.  Transfer learning with probabilistic mapping selection , 2015, Adapt. Behav..

[6]  Aldo A. Faisal,et al.  The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care , 2018, Nature Medicine.

[7]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[8]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[9]  Wensheng Zhang,et al.  Generalization Performance of Radial Basis Function Networks , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Youyong Kong,et al.  Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Hyun-Soo Choi,et al.  Biometric Authentication Using Noisy Electrocardiograms Acquired by Mobile Sensors , 2016, IEEE Access.

[12]  A C BianchiReinaldo,et al.  Transferring knowledge as heuristics in reinforcement learning , 2015 .

[13]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[14]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[15]  Reinaldo A. C. Bianchi,et al.  Transferring knowledge as heuristics in reinforcement learning: A case-based approach , 2015, Artif. Intell..

[16]  Alexandre M. Bayen,et al.  Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning , 2017, IEEE Transactions on Intelligent Transportation Systems.

[17]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..