Uncertainty handling CMA-ES for reinforcement learning

The covariance matrix adaptation evolution strategy (CMAES) has proven to be a powerful method for reinforcement learning (RL). Recently, the CMA-ES has been augmented with an adaptive uncertainty handling mechanism. Because uncertainty is a typical property of RL problems this new algorithm, termed UH-CMA-ES, is promising for RL. The UH-CMA-ES dynamically adjusts the number of episodes considered in each evaluation of a policy. It controls the signal to noise ratio such that it is just high enough for a sufficiently good ranking of candidate policies, which in turn allows the evolutionary learning to find better solutions. This significantly increases the learning speed as well as the robustness without impairing the quality of the final solutions. We evaluate the UH-CMA-ES on fully and partially observable Markov decision processes with random start states and noisy observations. A canonical natural policy gradient method and random search serve as a baseline for comparison.

[1]  Risto Miikkulainen,et al.  Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[2]  Christian Igel,et al.  Efficient covariance matrix update for variable metric evolution strategies , 2009, Machine Learning.

[3]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[4]  Petros Koumoutsakos,et al.  A Method for Handling Uncertainty in Evolutionary Optimization With an Application to Feedback Control of Combustion , 2009, IEEE Transactions on Evolutionary Computation.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[7]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[8]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[9]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[10]  Chun-Hung Chen,et al.  An alternative simulation budget allocation scheme for efficient simulation , 2005, Int. J. Simul. Process. Model..

[11]  Christian Igel,et al.  Neuroevolution for reinforcement learning using evolution strategies , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[12]  Gerald Sommer,et al.  Evolutionary reinforcement learning of artificial neural networks , 2007, Int. J. Hybrid Intell. Syst..

[13]  Jürgen Branke,et al.  Integrating Techniques from Statistical Ranking into Evolutionary Algorithms , 2006, EvoWorkshops.

[14]  Christian Igel,et al.  Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem , 2008, EWRL.

[15]  V. Heidrich-Meisner Uncertainty Handling in Evolutionary Direct Policy Search , 2008 .

[16]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[17]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[18]  L. Buşoniu Evolutionary function approximation for reinforcement learning , 2006 .

[19]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  H. Beyer,et al.  Noisy Local Optimization with Evolution Strategies , 2002 .

[22]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[23]  Petros Koumoutsakos,et al.  Evolutionary Optimization of Feedback Controllers for Thermoacoustic Instabilities , 2008 .

[24]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[25]  Gregor Schöner,et al.  Making Driver Modeling Attractive , 2005, IEEE Intell. Syst..

[26]  Xin Yao,et al.  Fast Evolution Strategies , 1997, Evolutionary Programming.

[27]  Dirk V. Arnold,et al.  Noisy Optimization With Evolution Strategies , 2002, Genetic Algorithms and Evolutionary Computation.

[28]  Christian Igel,et al.  Uncertainty Handling in Model Selection for Support Vector Machines , 2008, PPSN.

[29]  Christian Igel,et al.  Evolution Strategies for Direct Policy Search , 2008, PPSN.