Least-Squares Methods for Policy Iteration
暂无分享,去创建一个
Bart De Schutter | Robert Babuska | Lucian Busoniu | Alessandro Lazaric | Rémi Munos | Mohammad Ghavamzadeh | R. Munos | L. Buşoniu | A. Lazaric | M. Ghavamzadeh | B. Schutter | Robert Babuška
[1] Gene H. Golub,et al. Matrix computations , 1983 .
[2] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[3] D. Zwillinger. Least Squares Method , 1992 .
[4] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[5] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[10] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[11] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[12] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[13] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[14] Ioannis Vlahavas,et al. Methods and Applications of Artificial Intelligence , 2002, Lecture Notes in Computer Science.
[15] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[16] Michail G. Lagoudakis,et al. Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.
[17] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[18] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[19] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[20] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[21] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[22] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[23] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[24] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[25] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[26] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[27] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[28] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[29] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[30] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[31] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .
[32] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[33] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[34] Daniel Polani,et al. Learning RoboCup-Keepaway with Kernels , 2007, Gaussian Processes in Practice.
[35] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[36] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[37] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[38] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[39] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[40] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[41] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[42] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[43] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[44] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[45] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[46] Damien Ernst,et al. Using prior knowledge to accelerate online least-squares policy iteration , 2010, 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR).
[47] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[48] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[49] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[50] Bart De Schutter,et al. Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.
[51] B. Scherrer,et al. Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.
[52] Alessandro Lazaric,et al. Finite-sample Analysis of Bellman Residual Minimization , 2010, ACML.
[53] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[54] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[55] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[56] Bart De Schutter,et al. Approximate dynamic programming with a fuzzy parameterization , 2010, Autom..
[57] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[58] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[59] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[60] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[61] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[62] 张翔. Analysis of the class , 2013 .