Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces

We describe a nonlinear generalization of dual dynamic programming (DP) theory and its application to value function estimation for deterministic control problems over continuous state and action spaces, in a discrete-time infinite horizon setting. We prove, using a Benders-type argument leveraging the monotonicity of the Bellman operator, that the result of a one-stage policy evaluation can be used to produce nonlinear lower bounds on the optimal value function that are valid over the entire state space. These bounds contain terms reflecting the functional form of the system's costs, dynamics, and constraints. We provide an iterative algorithm that produces successively better approximations of the optimal value function, and prove under certain assumptions that it achieves an arbitrarily low desired Bellman optimality tolerance at preselected points in the state space, in a finite number of iterations. We also describe means of certifying the quality of the approximate value function generated. We demonstrate the efficacy of the approach on systems whose dimensions are too large for conventional DP approaches to be practical.

[1]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[2]  Stefan Minner,et al.  Optimizing Trading Decisions for Hydro Storage Systems Using Approximate Dual Dynamic Programming , 2013, Oper. Res..

[3]  Hubert Abgottspon Hydro power planning: Multi-horizon modeling and its applications , 2015 .

[4]  Ruud Egging,et al.  Benders Decomposition for multi-stage stochastic mixed complementarity problems - Applied to a global natural gas market model , 2013, Eur. J. Oper. Res..

[5]  Colin Neil Jones,et al.  A logarithmic-time solution to the point location problem for parametric linear programming , 2006, Autom..

[6]  Tor Arne Johansen,et al.  Explicit Approximate Model Predictive Control of Constrained Nonlinear Systems with Quantized Input , 2009 .

[7]  Tito Homem-de-Mello,et al.  Sampling strategies and stopping criteria for stochastic dual dynamic programming: a case study in long-term hydrothermal scheduling , 2011 .

[8]  M. Pereira Optimal stochastic operations scheduling of large hydroelectric systems , 1989 .

[9]  A. Laub,et al.  On the numerical solution of the discrete-time algebraic Riccati equation , 1980 .

[10]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[11]  J. Watson,et al.  Multi-Stage Robust Unit Commitment Considering Wind and Demand Response Uncertainties , 2013, IEEE Transactions on Power Systems.

[12]  M. V. F. Pereira,et al.  Multi-stage stochastic optimization applied to energy planning , 1991, Math. Program..

[13]  D. Chmielewski,et al.  On constrained infinite-time linear quadratic optimal control , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[14]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[15]  Erlon Cristian Finardi,et al.  Improving the performance of Stochastic Dual Dynamic Programming , 2015, J. Comput. Appl. Math..

[16]  Stephen P. Boyd,et al.  Performance bounds for linear stochastic control , 2009, Syst. Control. Lett..

[17]  Jacques F. Benders,et al.  Partitioning procedures for solving mixed-variables programming problems , 2005, Comput. Manag. Sci..

[18]  A. M. Geoffrion Generalized Benders decomposition , 1972 .

[19]  R. Bellman Dynamic programming. , 1957, Science.

[20]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[21]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[22]  S. Sastry Nonlinear Systems: Analysis, Stability, and Control , 1999 .

[23]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[24]  C. A. Desoer,et al.  Nonlinear Systems Analysis , 1978 .

[25]  Alexander Shapiro,et al.  Analysis of stochastic dual dynamic programming method , 2011, Eur. J. Oper. Res..

[26]  Andrew G. Barto,et al.  Reinforcement Learning and Dynamic Programming , 1995 .

[27]  George B. Dantzig,et al.  Multi-stage stochastic linear programs for portfolio optimization , 1993, Ann. Oper. Res..

[28]  John Lygeros,et al.  Approximate dynamic programming via sum of squares programming , 2012, 2013 European Control Conference (ECC).

[29]  Andrew B. Philpott,et al.  On the convergence of stochastic dual dynamic programming and related methods , 2008, Oper. Res. Lett..

[30]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[31]  Manfred Morari,et al.  Multi-Parametric Toolbox 3.0 , 2013, 2013 European Control Conference (ECC).

[32]  Manfred Morari,et al.  Polytopic Approximation of Explicit Model Predictive Controllers , 2010, IEEE Transactions on Automatic Control.

[33]  John Lygeros,et al.  Point-wise maximum approach to approximate dynamic programming , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[34]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[35]  J. Tsitsiklis,et al.  An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .

[36]  Stephen P. Boyd,et al.  Approximate dynamic programming via iterated Bellman inequalities , 2015 .

[37]  Christopher G. Atkeson,et al.  Random Sampling of States in Dynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..