Natural Actor-Critic

[1]  Dimitri P. Bertsekas,et al.  Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.

[2]  Xinhua Zhang,et al.  Conditional random fields for multi-agent reinforcement learning , 2007, ICML '07.

[3]  Stefan Schaal,et al.  Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning , 2007, ESANN.

[4]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[5]  Xinhua Zhang,et al.  Conditional Random Fields for Reinforcement Learning , 2007 .

[6]  Csaba Szepesv Natural Actor-Critic , 2007 .

[7]  Olivier Buffet,et al.  Shaping multi-agent systems with gradient reinforcement learning , 2007, Autonomous Agents and Multi-Agent Systems.

[8]  Jin Yu,et al.  Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.

[9]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Shin Ishii,et al.  Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Critic , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Douglas Aberdeen,et al.  POMDPs and Policy Gradients , 2006 .

[12]  Jongho Kim,et al.  An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.

[13]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[14]  Jeff G. Schneider,et al.  Covariant policy search , 2003, IJCAI 2003.

[15]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[16]  Sethu Vijayakumar,et al.  Scaling Reinforcement Learning Paradigms for Motor Learning , 2003 .

[17]  Jun Nakanishi,et al.  Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Peter L. Bartlett,et al.  An Introduction to Reinforcement Learning Theory: Value Function Methods , 2002, Machine Learning Summer School.

[19]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[20]  Kenji Fukumizu,et al.  Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.

[21]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[22]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[23]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[24]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[25]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[26]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[27]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[28]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[29]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[30]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.