Approximate Modied Policy Iteration
暂无分享,去创建一个
[1] Csaba Szepesv. Reinforcement Learning Algorithms for MDPs , 2010 .
[2] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[3] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[4] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[5] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[6] Uriel G. Rothblum,et al. (Approximate) iterated successive approximations algorithm for sequential decision processes , 2013, Ann. Oper. Res..
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[9] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[10] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[11] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[12] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .