Kernel-Based Least Squares Policy Iteration for Reinforcement Learning
暂无分享,去创建一个
Xin Xu | Dewen Hu | Xicheng Lu | Xin Xu | D. Hu | Xicheng Lu
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] R. Murray,et al. Nonlinear controllers for non-integrable systems: the Acrobot example , 1990, 1990 American Control Conference.
[3] M. W. Spong,et al. Pseudolinearization of the acrobot using spline functions , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.
[4] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .
[7] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[8] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[9] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[10] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[11] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[12] Stephen Yurkovich,et al. Fuzzy Control , 1997 .
[13] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[14] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[15] Alexander J. Smola,et al. Learning with kernels , 1998 .
[16] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[17] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[18] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[19] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[20] K. Passino. Intelligent Control : An Overview of Techniques ∗ , 2000 .
[21] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[22] T. Samad. Intelligent Control: An Overview of Techniques , 2001 .
[23] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.
[24] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[25] H. He,et al. Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..
[26] Xin Xu,et al. Residual-gradient-based neural reinforcement learning for the optimal control of an acrobot , 2002, Proceedings of the IEEE Internatinal Symposium on Intelligent Control.
[27] Dustin Boswell,et al. Introduction to Support Vector Machines , 2002 .
[28] Shie Mannor,et al. Sparse Online Greedy Support Vector Regression , 2002, ECML.
[29] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[30] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[31] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[32] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[33] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.
[34] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[35] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[36] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[37] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[38] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[39] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[41] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[42] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[43] R. Clayton,et al. Epicardial ECG Mapping of Human Ventricular Fibrillation , 2006 .
[44] Kevin M. Passino,et al. Biomimicry for Optimization, Control and Automation , 2004, IEEE Transactions on Automatic Control.
[45] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .