Policy Gradient Methods for Robot Control
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] Vijaykumar Gullapalli,et al. Learning Control Under Extreme Uncertainty , 1992, NIPS.
[3] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[4] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[7] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[8] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[9] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[10] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[11] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[12] J. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes: implementation issues , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[13] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[14] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[15] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[16] Karsten Berns,et al. Adaptive biologically inspired control for the four-legged walking machine BISAM , 1999 .
[17] T. Moon,et al. Mathematical Methods and Algorithms for Signal Processing , 1999 .
[18] Chaouki T. Abdallah,et al. Linear Quadratic Control: An Introduction , 2000 .
[19] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[20] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[21] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[22] Stefan Schaal,et al. Forward models in visuomotor control. , 2002, Journal of neurophysiology.
[23] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[24] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[25] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.