论文信息 - Patching Approximate Solutions in Reinforcement Learning

Patching Approximate Solutions in Reinforcement Learning

This paper introduces an approach to improving an approximate solution in reinforcement learning by augmenting it with a small overriding patch. Many approximate solutions are smaller and easier to produce than a flat solution, but the best solution within the constraints of the approximation may fall well short of global optimality. We present a technique for efficiently learning a small patch to reduce this gap. Empirical evaluation demonstrates the effectiveness of patching, producing combined solutions that are much closer to global optimality.

William T. B. Uther | Min Sub Kim | Min Sub Kim

[1] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[4] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..

[5] Peter Stone,et al. Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[6] Nikos A. Vlassis,et al. Sparse cooperative Q-learning , 2004, ICML.

[7] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[10] Nikos A. Vlassis,et al. Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[11] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..