论文信息 - Model-based DDPG for motor control

Model-based DDPG for motor control

The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.

[1] Michael I. Jordan,et al. Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[2] Jun Morimoto,et al. Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.

[3] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[4] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[5] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[6] E. Todorov. Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[7] Roland E. Suri,et al. Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[8] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[9] Saori C. Tanaka,et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[10] K. Doya. Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.