Parametric value function approximation: A unified view
暂无分享,去创建一个
[1] Olivier Buffet,et al. Markov Decision Processes in Artificial Intelligence , 2010 .
[2] Matthieu Geist,et al. Statistically linearized least-squares temporal differences , 2010, International Congress on Ultra Modern Telecommunications and Control Systems.
[3] Robert Fitch,et al. Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation , 2007, ICML '07.
[4] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[5] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[6] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[7] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[10] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[11] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[12] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[13] O. Pietquin,et al. A Brief Survey of Parametric Value Function Approximation A Brief Survey of Parametric Value Function Approximation , 2010 .
[14] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[15] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[17] O. Pietquin,et al. Statistically linearized recursive least squares , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.
[18] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[19] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[20] Matthieu Geist,et al. Eligibility traces through colored noises , 2010, International Congress on Ultra Modern Telecommunications and Control Systems.
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] D. Bertsekas,et al. Q-learning algorithms for optimal stopping based on least squares , 2007, 2007 European Control Conference (ECC).
[23] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[24] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[25] O. Pietquin,et al. Managing Uncertainty within Value Function Approximation in Reinforcement Learning , 2010 .
[26] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[27] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[28] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[29] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[30] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[31] Matthieu Geist,et al. Tracking in Reinforcement Learning , 2009, ICONIP.
[32] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[33] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[34] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[35] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[36] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[37] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[38] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[39] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[40] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.
[41] Matthieu Geist,et al. Kalman Temporal Differences: The deterministic case , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[42] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[43] D. Bertsekas. Projected Equations, Variational Inequalities, and Temporal Difference Methods , 2009 .
[44] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[45] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..
[46] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[47] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[48] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[49] T. Söderström,et al. Instrumental variable methods for system identification , 1983 .
[50] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[51] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[52] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[53] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.