论文信息 - Natural Actor-Critic - 字舞流文

Natural Actor-Critic

Stefan Schaal | Jan Peters | Jan Peters | S. Vijayakumar | S. Schaal

[1] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.

[2] Xinhua Zhang,et al. Conditional random fields for multi-agent reinforcement learning , 2007, ICML '07.

[3] Stefan Schaal,et al. Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning , 2007, ESANN.

[4] Aude Billard,et al. Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[5] Xinhua Zhang,et al. Conditional Random Fields for Reinforcement Learning , 2007 .

[6] Csaba Szepesv. Natural Actor-Critic , 2007 .

[7] Olivier Buffet,et al. Shaping multi-agent systems with gradient reinforcement learning , 2007, Autonomous Agents and Multi-Agent Systems.

[8] Jin Yu,et al. Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.

[9] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10] Shin Ishii,et al. Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Critic , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11] Douglas Aberdeen,et al. POMDPs and Policy Gradients , 2006 .

[12] Jongho Kim,et al. An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.

[13] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[14] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.

[15] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[16] Sethu Vijayakumar,et al. Scaling Reinforcement Learning Paradigms for Motor Learning , 2003 .

[17] Jun Nakanishi,et al. Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18] Peter L. Bartlett,et al. An Introduction to Reinforcement Learning Theory: Value Function Methods , 2002, Machine Learning Summer School.

[19] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[20] Kenji Fukumizu,et al. Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.

[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[22] T. Moon,et al. Mathematical Methods and Algorithms for Signal Processing , 1999 .

[23] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[24] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.

[25] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[26] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[27] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[28] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[29] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[30] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.