论文信息 - Projection-Based Constrained Policy Optimization

Projection-Based Constrained Policy Optimization

In this paper, we consider the problem of learning control policies that optimize areward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm - Projection Based ConstrainedPolicy Optimization (PCPO), an iterative method for optimizing policies in a two-step process - the first step performs an unconstrained update while the secondstep reconciles the constraint violation by projection the policy back onto the constraint set. We theoretically analyze PCPO and provide a lower bound on rewardimprovement, as well as an upper bound on constraint violation for each policy update. We further characterize the convergence of PCPO with projection basedon two different metrics - L2 norm and Kullback-Leibler divergence. Our empirical results over several control tasks demonstrate that our algorithm achievessuperior performance, averaging more than 3.5 times less constraint violation andaround 15% higher reward compared to state-of-the-art methods.

Karthik Narasimhan | Karthik Narasimhan

[1] Mohammad Ghavamzadeh,et al. Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[2] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[3] Alexandre M. Bayen,et al. Benchmarks for reinforcement learning in mixed-autonomy traffic , 2018, CoRL.

[4] Yang Gao,et al. Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[5] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[6] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[7] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[8] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..