Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement

In this paper we present a method based on the "learning from demonstration" paradigm to get a cost-efficient control policy in a continuous state and action space. The controlled plant is a two degrees-of-freedom planar arm actuated by six muscles. We learn a parametric control policy with XCSF from a few near-optimal trajectories, and we study its capability to generalize over the whole reachable space. Furthermore, we show that an additional Cross-Entropy Policy Search method can improve the global performance of the parametric controller.

[1]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[4]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[5]  Xavier Llorà,et al.  XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining , 2001, IWLCS.

[6]  Stewart W. Wilson Function approximation with a classifier system , 2001 .

[7]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[8]  Pier Luca Lanzi,et al.  Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[9]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  Martin V. Butz,et al.  Toward a theory of generalization and learning in XCS , 2004, IEEE Transactions on Evolutionary Computation.

[11]  Stewart W. Wilson Classifiers that approximate functions , 2002, Natural Computing.

[12]  Three Architectures for Continuous Action , 2005, IWLCS.

[13]  Martin V. Butz,et al.  Computational Complexity of the XCS Classifier System , 2005 .

[14]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[15]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[16]  Weiwei Li Optimal control for biological movement systems , 2006 .

[17]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[18]  Emmanuel Guigon,et al.  Optimality, stochasticity, and variability in motor behavior , 2008, Journal of Computational Neuroscience.

[19]  Cédric Sanza,et al.  XCSF with computed continuous action , 2007, GECCO '07.

[20]  Martin V. Butz,et al.  Context-dependent predictions and cognitive arm control with XCSF , 2008, GECCO '08.

[21]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[22]  Olivier Sigaud,et al.  Control of redundant robots using learned models: An operational space control approach , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  M. Xu-Wilson,et al.  Movement Duration as an Emergent Property of Reward Directed Motor Control , 2010 .

[24]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[26]  Christophe Thiery,et al.  Itération sur les Politiques Optimiste et Apprentissage du Jeu de Tetris. (Optimistic Policy Iteration and Learning the Game of Tetris) , 2010 .

[27]  Martin V. Butz,et al.  A comparative study: function approximation with LWPR and XCSF , 2010, GECCO '10.