论文信息 - Reinforcement learning with rare significant events: direct policy search vs. gradient policy search - 字舞流文

Reinforcement learning with rare significant events: direct policy search vs. gradient policy search

This paper shows that the CMAES direct policy search method fares significantly better than PPO gradient policy search for a reinforcement learning task where significant events are rare.

Nicolas Bredeche | Jean-Baptiste André | Nicolas Fontbonne | Paul Ecoffet

[1] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.

[2] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.

[3] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[4] Vivek S. Borkar,et al. A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events , 2006, J. Mach. Learn. Res..

[5] Nicolas Bredeche,et al. Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate with , 2021, ArXiv.

[6] Shimon Whiteson,et al. OFFER: Off-Environment Reinforcement Learning , 2017, AAAI.