论文信息 - Building a Library of Policies through Policy Reuse

Building a Library of Policies through Policy Reuse

Abstract : Policy Reuse (PR) provides Reinforcement Learning algorithms with a mechanism to bias an exploration process by reusing a set of past policies. Policy Reuse offers the challenge of balancing the exploitation of the ongoing learned policy, the exploration of new random actions, and the exploitation of past policies. Efficient application of Policy Reuse requires a mechanism to build, for each domain, a library of policies that is useful and accurate enough to efficiently solve any task in such a domain. In this work, the authors propose a mechanism to create a library of policies based on a similarity metric among policies. If the new policy is similar to any of the past ones, it is not added to the library. Otherwise, it is stored together with the other policies so it can be reused in the future. Thus, the Policy Library stores the "basis" or "eigen-policies" of each domain (i.e., the core past policies that are effectively reusable). Empirical results demonstrate that the Policy Library can be efficiently created and that the stored "eigen-policies" can be understood as a representation of the structure of the domain.

Manuela Veloso | Fernando Fernández

[1] Manuela Veloso,et al. Exploration and Policy Reuse , 2005 .

[2] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[3] Manuela Veloso,et al. Tree based hierarchical reinforcement learning , 2002 .

[4] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[5] Manuela Veloso,et al. Probabilistic Reuse of Past Policies , 2005 .

[6] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[7] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8] Manuela M. Veloso,et al. Real-Time Randomized Path Planning for Robot Navigation , 2002, RoboCup.

[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10] Manuela M. Veloso,et al. Planning and Learning by Analogical Reasoning , 1994, Lecture Notes in Computer Science.

[11] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..