Towards fast and adaptive optimal control policies for robots : A direct policy search approach

Optimal control methods are generally too expensive to be applied on-line and in real-time to the control of robots. An alternative method consists in tuning a parametrized reactive controller so that it converges to optimal behavior. In this paper we present such a method based on the “direct Policy Search” paradigm to get a cost-efficient control policy for a simulated two degrees-of-freedom planar arm actuated by six muscles. We learn a parametric controller from demonstration using a few near-optimal trajectories. Then we tune the parameters of this controller using two versions of a Cross-Entropy Policy Search method that we compare. Finally, we show that the resulting controller is 20000 times faster than an optimal control method producing the same trajectories.

[1]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[2]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[3]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[4]  Stewart W. Wilson Function approximation with a classifier system , 2001 .

[5]  Martin V. Butz,et al.  Toward a theory of generalization and learning in XCS , 2004, IEEE Transactions on Evolutionary Computation.

[6]  Stewart W. Wilson Classifiers that approximate functions , 2002, Natural Computing.

[7]  Martin V. Butz,et al.  Computational Complexity of the XCS Classifier System , 2005 .

[8]  Stewart W. Wilson,et al.  Noname manuscript No. (will be inserted by the editor) Learning Classifier Systems: A Survey , 2022 .

[9]  Martin V. Butz,et al.  Context-dependent predictions and cognitive arm control with XCSF , 2008, GECCO '08.

[10]  Christian Igel,et al.  Similarities and differences between policy gradient methods and evolution strategies , 2008, ESANN.

[11]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[12]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[13]  Olivier Sigaud,et al.  Control of redundant robots using learned models: An operational space control approach , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  M. Xu-Wilson,et al.  Movement Duration as an Emergent Property of Reward Directed Motor Control , 2010 .

[16]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Giulio Sandini,et al.  Approximate optimal control for reaching and trajectory planning in a humanoid robot , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Martin V. Butz,et al.  A comparative study: function approximation with LWPR and XCSF , 2010, GECCO '10.

[19]  V. Pasqui,et al.  Characterization of a Least Effort User-Centered Trajectory for Sit-to-Stand Assistance , 2011 .

[20]  Lionel Rigoux,et al.  Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement , 2011, GECCO '11.

[21]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..