论文信息 - Probabilistic policy reuse in a reinforcement learning agent - 字舞流文

Probabilistic policy reuse in a reinforcement learning agent

We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce the algorithm and its major components: an exploration strategy to include the new reuse bias, and a similarity function to estimate the similarity of past policies with respect to a new one. We provide empirical results demonstrating that Policy Reuse improves the learning performance over different strategies that learn without reuse. Interestingly and almost as a side effect, Policy Reuse also identifies classes of similar policies revealing a basis of core policies of the domain. We demonstrate that such a basis can be built incrementally, contributing the learning of the structure of a domain.

Manuela M. Veloso | Fernando Fernández | M. Veloso | F. Fernández

[1] C. Watkins. Learning from delayed rewards , 1989 .

[2] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[3] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[4] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[5] M. Veloso,et al. Bounding the suboptimality of reusing subproblems , 1999, IJCAI 1999.

[6] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[7] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[8] James L. Carroll,et al. Fixed vs. Dynamic Sub-Transfer in Reinforcement Learning , 2002, ICMLA.

[9] Manuela Veloso,et al. Tree based hierarchical reinforcement learning , 2002 .

[10] Fernando Fernández,et al. On Determinism Handling While Learning Reduced State Space Representations , 2002, ECAI.

[11] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[12] Michael G. Madden,et al. Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.

[13] Peter Stone,et al. Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[14] Peter Stone,et al. Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[15] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[16] Peter Vamplew,et al. Concurrent Q‐learning: Reinforcement learning for dynamic goals and environments , 2005, Int. J. Intell. Syst..

[17] Manuela Veloso,et al. Exploration and Policy Reuse , 2005 .

[18] Jude W. Shavlik,et al. Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[19] Peter Stone,et al. Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.