Exponential Convergence and Stability of Howard's Policy Improvement Algorithm for Controlled Diffusions

Optimal control problems are inherently hard to solve as the optimization must be performed simultaneously with updating the underlying system. Starting from an initial guess, Howard's policy improvement algorithm separates the step of updating the trajectory of the dynamical system from the optimization and iterations of this should converge to the optimal control. In the discrete space-time setting this is often the case and even rates of convergence are known. In the continuous space-time setting of controlled diffusion the algorithm consists of solving a linear PDE followed by maximization problem. This has been shown to converge, in some situations, however no global rate of is known. The first main contribution of this paper is to establish global rate of convergence for the policy improvement algorithm and a variant, called here the gradient iteration algorithm. The second main contribution is the proof of stability of the algorithms under perturbations to both the accuracy of the linear PDE solution and the accuracy of the maximization step. The proof technique is new in this context as it uses the theory of backward stochastic differential equations.

[1]  Marco Fuhrman,et al.  Infinite horizon backward stochastic differential equations and elliptic equations in Hilbert spaces , 2004 .

[2]  A. Richou Markovian quadratic and superquadratic BSDEs with an unbounded terminal condition , 2011, 1111.5137.

[3]  R. Bellman FUNCTIONAL EQUATIONS IN THE THEORY OF DYNAMIC PROGRAMMING. V. POSITIVITY AND QUASI-LINEARITY. , 1955, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Huyen Pham,et al.  Continuous-time stochastic control and optimization with financial applications / Huyen Pham , 2009 .

[5]  U. Rieder,et al.  Control Improvement for Jump-Diffusion Processes with Applications to Finance , 2012 .

[6]  A. Mijatović,et al.  Coupling and a generalised Policy Iteration Algorithm in continuous time , 2017, 1707.07834.

[7]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  Aleksandar Mijatovi'c,et al.  On the policy improvement algorithm in continuous time , 2015, 1509.09041.

[9]  Martin L. Puterman,et al.  On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..

[10]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[11]  N. Kazamaki Continuous Exponential Martingales and Bmo , 1994 .

[12]  Hasnaa Zidani,et al.  Some Convergence Results for Howard's Algorithm , 2009, SIAM J. Numer. Anal..

[13]  S. Jacka,et al.  Evaluation of the Rate of Convergence in the PIA , 2017, 1709.06466.

[14]  On Finite-Difference Approximations for Normalized Bellman Equations , 2006, math/0610855.

[15]  Martin L. Puterman,et al.  On the convergence of policy iteration for controlled diffusions , 1981 .

[16]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[17]  S. Peng,et al.  Ergodic Backward SDE and Associated PDE , 1999 .

[18]  R Bellman,et al.  Some Functional Equations in the Theory of Dynamic Programming. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Hongjie Dong,et al.  The Rate of Convergence of Finite-Difference Approximations for Parabolic Bellman Equations with Lipschitz Coefficients in Cylindrical Domains , 2007 .

[20]  N. Krylov Controlled Diffusion Processes , 1980 .

[21]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[22]  John Rust,et al.  Convergence Properties of Policy Iteration , 2003, SIAM J. Control. Optim..