暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. New error bounds for approximations from projected linear equations , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[2] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[3] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[4] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[5] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[6] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[7] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[8] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[9] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[10] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[11] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[12] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[13] Daniel B. Szyld,et al. The many proofs of an identity on the norm of oblique projections , 2006, Numerical Algorithms.
[14] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[15] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.