论文信息 - Improving Action Selection in MDP's via Knowledge Transfer

Improving Action Selection in MDP's via Knowledge Transfer

Temporal-difference reinforcement learning (RL) has been successfully applied in several domains with large state sets. Large action sets, however, have received considerably less attention. This paper demonstrates the use of knowledge transfer between related tasks to accelerate learning with large action sets. We introduce action transfer, a technique that extracts the actions from the (near-)optimal solution to the first task and uses them in place of the full action set when learning any subsequent tasks. When optimal actions make up a small fraction of the domain's action set, action transfer can substantially reduce the number of actions and thus the complexity of the problem. However, action transfer between dissimilar tasks can be detrimental. To address this difficulty, we contribute randomized task perturbation (RTP), an enhancement to action transfer that makes it robust to unrepresentative source tasks. We motivate RTP action transfer with a detailed theoretical analysis featuring a formalism of related tasks and a bound on the suboptimality of action transfer. The empirical results in this paper show the potential of RTP action transfer to substantially expand the applicability of RL to problems with large action sets.

Peter Stone | Alexander A. Sherstov | P. Stone

[1] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[4] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[5] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[6] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[7] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[8] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[9] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[10] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[11] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.