Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[2] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[3] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[4] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[5] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[6] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[8] Christian Igel,et al. Evolution Strategies for Direct Policy Search , 2008, PPSN.
[9] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[10] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[11] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[15] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[18] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[19] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[20] Simon M. Lucas,et al. Parallel Problem Solving from Nature - PPSN X, 10th International Conference Dortmund, Germany, September 13-17, 2008, Proceedings , 2008, PPSN.
[21] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[22] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[23] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function Space , 2007 .
[24] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[27] Alessandro Lazaric,et al. Conservative and Greedy Approaches to Classification-Based Policy Iteration , 2012, AAAI.