A Fast and Reliable Policy Improvement Algorithm
暂无分享,去创建一个
[1] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[5] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[6] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[9] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.