论文信息 - Path Integral Policy Improvement with Covariance Matrix Adaptation

Path Integral Policy Improvement with Covariance Matrix Adaptation

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically.

Olivier Sigaud | Freek Stulp | F. Stulp | Olivier Sigaud

[1] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[2] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[3] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[4] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5] Jon C. Dattorro,et al. Convex Optimization & Euclidean Distance Geometry , 2004 .

[6] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[7] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[8] Christian Igel,et al. Evolution Strategies for Direct Policy Search , 2008, PPSN.

[9] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[10] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[11] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[12] Lionel Rigoux,et al. Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement , 2011, GECCO '11.

[13] Bart De Schutter,et al. Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14] Ales Ude,et al. Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[15] Stefan Schaal,et al. Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16] Marin Kobilarov,et al. Cross-Entropy Randomized Motion Planning , 2011, Robotics: Science and Systems.