Approximate Policy Iteration Schemes: A Comparison
暂无分享,去创建一个
[1] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[2] Alessandro Lazaric,et al. Conservative and Greedy Approaches to Classification-Based Policy Iteration , 2012, AAAI.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[5] Bruno Scherrer,et al. Performance bounds for λ policy iteration and application to the game of Tetris , 2013, J. Mach. Learn. Res..
[6] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[7] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[8] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[9] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .
[10] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[11] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[12] Bruno Scherrer,et al. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris , 2007 .
[13] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[15] U. Rieder,et al. Markov Decision Processes , 2010 .
[16] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[17] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.