暂无分享,去创建一个
[1] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[2] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[3] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[4] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[5] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[6] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[7] Uriel G. Rothblum,et al. (Approximate) iterated successive approximations algorithm for sequential decision processes , 2013, Ann. Oper. Res..
[8] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[9] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[10] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[11] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[14] Csaba Szepesv,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007 .
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).