Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI 2 is a recent example of this

[1]  Martin V. Butz,et al.  Context-dependent predictions and cognitive arm control with XCSF , 2008, GECCO '08.

[2]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[4]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[5]  Bart De Schutter,et al.  Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Stefan Schaal,et al.  Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[8]  Lionel Rigoux,et al.  Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement , 2011, GECCO '11.

[9]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[10]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[11]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[12]  Olivier Sigaud,et al.  Towards fast and adaptive optimal control policies for robots : A direct policy search approach , 2012 .

[13]  Jon C. Dattorro,et al.  Convex Optimization & Euclidean Distance Geometry , 2004 .

[14]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[15]  M. Guay,et al.  Excitation Signal Design for Parameter Convergence in Adaptive Control of Linearizable Systems , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[16]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[17]  Jiang Wang,et al.  Intelligent Excitation for Adaptive Control With Unknown Parameters in Reference Input , 2007, IEEE Transactions on Automatic Control.

[18]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[19]  Ales Ude,et al.  Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[20]  D. Marin,et al.  Reaching optimally over the workspace: A machine learning approach , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[21]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[22]  Marin Kobilarov,et al.  Cross-Entropy Randomized Motion Planning , 2011, Robotics: Science and Systems.

[23]  Freek Stulp Adaptive exploration for continual reinforcement learning , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[25]  Christian Igel,et al.  Evolution Strategies for Direct Policy Search , 2008, PPSN.

[26]  B. Wittenmark Adaptive dual control , 2002 .