论文信息 - A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm

We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.

Stephen J. Wright | Peter L. Bartlett | Yasin Abbasi-Yadkori

[1] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.

[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[5] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[6] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.

[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[8] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.

[9] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.