论文信息 - Towards fast and adaptive optimal control policies for robots : A direct policy search approach

Towards fast and adaptive optimal control policies for robots : A direct policy search approach

Optimal control methods are generally too expensive to be applied on-line and in real-time to the control of robots. An alternative method consists in tuning a parametrized reactive controller so that it converges to optimal behavior. In this paper we present such a method based on the “direct Policy Search” paradigm to get a cost-efficient control policy for a simulated two degrees-of-freedom planar arm actuated by six muscles. We learn a parametric controller from demonstration using a few near-optimal trajectories. Then we tune the parameters of this controller using two versions of a Cross-Entropy Policy Search method that we compare. Finally, we show that the resulting controller is 20000 times faster than an optimal control method producing the same trajectories.

Olivier Sigaud | D. Marin

[1] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[2] Reuven Y. Rubinstein,et al. Optimization of computer simulation models with rare events , 1997 .

[3] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[4] Stewart W. Wilson. Function approximation with a classifier system , 2001 .

[5] Martin V. Butz,et al. Toward a theory of generalization and learning in XCS , 2004, IEEE Transactions on Evolutionary Computation.

[6] Stewart W. Wilson. Classifiers that approximate functions , 2002, Natural Computing.

[7] Martin V. Butz,et al. Computational Complexity of the XCS Classifier System , 2005 .

[8] Stewart W. Wilson,et al. Noname manuscript No. (will be inserted by the editor) Learning Classifier Systems: A Survey , 2022 .

[9] Martin V. Butz,et al. Context-dependent predictions and cognitive arm control with XCSF , 2008, GECCO '08.

[10] Christian Igel,et al. Similarities and differences between policy gradient methods and evolution strategies , 2008, ESANN.

[11] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[12] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[13] Olivier Sigaud,et al. Control of redundant robots using learned models: An operational space control approach , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] Stefan Schaal,et al. Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15] M. Xu-Wilson,et al. Movement Duration as an Emergent Property of Reward Directed Motor Control , 2010 .

[16] Darwin G. Caldwell,et al. Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17] Giulio Sandini,et al. Approximate optimal control for reaching and trajectory planning in a humanoid robot , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18] Martin V. Butz,et al. A comparative study: function approximation with LWPR and XCSF , 2010, GECCO '10.

[19] V. Pasqui,et al. Characterization of a Least Effort User-Centered Trajectory for Sit-to-Stand Assistance , 2011 .

[20] Lionel Rigoux,et al. Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement , 2011, GECCO '11.

[21] Stefan Schaal,et al. Learning variable impedance control , 2011, Int. J. Robotics Res..