论文信息 - Model-based Policy Gradient Reinforcement Learning

Model-based Policy Gradient Reinforcement Learning

Policy gradient methods based on REINFORCE are model-free in the sense that they estimate the gradient using only online experiences executing the current stochastic policy. This is extremely wasteful of training data as well as being computationally inefficient. This paper presents a new model-based policy gradient algorithm that uses training experiences much more efficiently. Our approach constructs a series of incomplete models of the MDP, and then applies these models to compute the policy gradient in closed form. The paper describes an algorithm that alternates between pruning (to remove irrelevant parts of the incomplete MDP model), exploration (to gather training data in the relevant parts of the state space), and gradient ascent search. We show experimental results on several benchmark problems including resource-constrained scheduling. The overall feasibility of this approach depends on whether a sufficiently informative partial model can fit into available memory.

Xin Wang | Thomas G. Dietterich | Xin Wang

[1] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[4] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[5] Kee-Eung Kim,et al. Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.

[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[7] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[8] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[9] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.

[10] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.