论文信息 - Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some predefined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is the following surprising result: if the policy space is convex, any (approximate) local optimum enjoys a global performance guarantee. Unfortunately, the convexity assumption is strong: it is not satisfied by commonly used parameterizations and designing a parameterization that induces this property seems hard. A natural solution to alleviate this issue consists in deriving an algorithm that solves the local policy search problem using a boosting approach (constrained to the convex hull of the policy space). The resulting algorithm turns out to be a slight generalization of conservative policy iteration; thus, our second contribution is to highlight an original connection between local policy search and approximate dynamic programming.

Matthieu Geist | Bruno Scherrer

[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[2] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[3] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.

[4] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[5] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.

[6] K. I. M. McKinnon,et al. On the Generation of Markov Decision Processes , 1995 .

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] Christian Igel,et al. Evolution Strategies for Direct Policy Search , 2008, PPSN.

[9] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.

[10] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..

[11] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.