论文信息 - Action-Space Knowledge Transfer in MDP ’ s : Formalism , Suboptimality Bounds , and Algorithms ?

Action-Space Knowledge Transfer in MDP ’ s : Formalism , Suboptimality Bounds , and Algorithms ?

Temporal-difference reinforcement learning (RL) has been successfully applied in several domains with large statesets. Largeaction sets, however, have received considerably less attention. This pape r studies the use of knowledge transfer between related tasks to accelerate lea rning with large action sets. We introduce action transfer , a technique that extracts the actions from the (near-)optimal solution to the first task and uses them in place of the full action set when learning any subsequent tasks. When optimal ac tions make up a small fraction of the domain’s action set, action transfer c an substantially reduce the number of actions and thus the complexity of the problem. However, action transfer betweendissimilar tasks can be detrimental. We present a novel formalism of related tasks and use it to derive a bound on the subopti mality of action transfer. We additionally bound action-transfer suboptim al ty in generic MDP’s and analyze the feasibility of provably reliableaction transfer. We build on this analysis to propose randomized task perturbation (RTP), an enhancement to action transfer that makes it robust to unrepresentative sour ce tasks. The empirical results in this paper show the potential of RTP action transf er to substantially expand the applicability of RL to problems with large action sets.

Alexander A. Sherstov | P. Stone

[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[3] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[4] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[5] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[6] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[7] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[8] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[9] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[10] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[11] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.