论文信息 - Proto-transfer Learning in Markov Decision Processes Using Spectral Methods

Proto-transfer Learning in Markov Decision Processes Using Spectral Methods

In this paper we introduce proto-transfer leaning, a new framework for transfer learning. We explore solutions to transfer learning within reinforcement learning through the use of spectral methods. Proto-value functions (PVFs) are basis functions computed from a spectral analysis of random walks on the state space graph. They naturally lead to the ability to transfer knowledge and representation between related tasks or domains. We investigate task transfer by using the same PVFs in Markov decision processes (MDPs) with different rewards functions. Additionally, our experiments in domain transfer explore applying the Nystrom method for interpolation of PVFs between MDPs of different sizes. 1. Problem Statement The aim of transfer learning is to reuse behavior by using the knowledge learned about one domain or task to accelerate learning in a related domain or task. In this paper we explore solutions to transfer learning within reinforcement learning (Sutton & Barto, 1998) through spectral methods. The new framework of proto-transfer learning transfers representations from one domain to another. This transfer entails the reuse of eigenvectors learned from one graph on another. We explore how to transfer knowledge learned on the source graph to a similar graph by modifying the eigenvectors of the Laplacian of the source domain to be reused for the target domain. Proto-value functions (PVFs) are a natural abstraction since they condense a domain by automatically learning an embedding of the Appearing in the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). state space based on its topology (Mahadevan, 2005). PVFs lead to the ability to transfer knowledge about domains and tasks, since they are constructed without taking reward into account. We define task transfer as the problem of transferring knowledge when the state space remains the same and only the reward differs. For task transfer, taskindependent basis functions, such as PVFs, can be reused from one task to the next without modification. Domain transfer refers to the more challenging problem of the state space changing. This change in state space can be a change in topology (i.e. obstacles moving to different locations) or a change in scale (i.e. a smaller or larger domain of the same shape). For domain transfer, the basis functions may need to be modified to reflect the changes in the state space. (Foster & Dayan, 2002) study the task transfer problem by applying unsupervised, mixture model, learning methods to a collection of optimal value functions of different tasks in order to decompose and extract the underlying structure. In this paper, we investigate task transfer in discrete domains by reusing PVFs in MDPs with different reward functions. For domain transfer, we apply the Nystrom extension for interpolation of PVFs between MDPs of different sizes (Mahadevan et al., 2006). Previous work has accelerated learning when transferring behaviors between tasks and domains (Taylor et al., 2005), but we transfer representation and reuse knowledge to learn comparably on a new task or domain.

S. Mahadevan | Kimberly Ferguson

[1] J. Gower. Generalized procrustes analysis , 1975 .

[2] V. N. Bogaevski,et al. Matrix Perturbation Theory , 1991 .

[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[4] G. Micula,et al. Numerical Treatment of the Integral Equations , 1999 .

[5] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[6] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.

[7] Peter Stone,et al. Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[8] Daniel D. Lee,et al. Semisupervised alignment of manifolds , 2005, AISTATS.

[9] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.

[10] Sridhar Mahadevan,et al. Learning Representation and Control in Continuous Markov Decision Processes , 2006, AAAI.