Efficient sampling in approximate dynamic programming algorithms

Abstract Dynamic Programming (DP) is known to be a standard optimization tool for solving Stochastic Optimal Control (SOC) problems, either over a finite or an infinite horizon of stages. Under very general assumptions, commonly employed numerical algorithms are based on approximations of the cost-to-go functions, by means of suitable parametric models built from a set of sampling points in the d-dimensional state space. Here the problem of sample complexity, i.e., how “fast” the number of points must grow with the input dimension in order to have an accurate estimate of the cost-to-go functions in typical DP approaches such as value iteration and policy iteration, is discussed. It is shown that a choice of the sampling based on low-discrepancy sequences, commonly used for efficient numerical integration, permits to achieve, under suitable hypotheses, an almost linear sample complexity, thus contributing to mitigate the curse of dimensionality of the approximate DP procedure.

[1]  R. Bellman,et al.  Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[2]  Thomas Parisini,et al.  Approximating networks, dynamic programming and stochastic approximation , 2000, Proceedings of the 2000 American Control Conference. ACC (IEEE Cat. No.00CH36334).

[3]  Cristiano Cervellera,et al.  Neural network and regression spline value function approximations for stochastic dynamic programming , 2007, Comput. Oper. Res..

[4]  Ying Li,et al.  Numerical Solution of Continuous-State Dynamic Programs Using Linear and Spline Interpolation , 1993, Oper. Res..

[5]  Cristiano Cervellera,et al.  Deterministic design for neural network learning: an approach based on discrepancy , 2004, IEEE Transactions on Neural Networks.

[6]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .

[7]  Harald Niederreiter,et al.  Programs to generate Niederreiter's low-discrepancy sequences , 1994, TOMS.

[8]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[9]  M. Sanguineti,et al.  Approximating Networks and Extended Ritz Method for the Solution of Functional Optimization Problems , 2002 .

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Arnold J. Stromberg,et al.  Number-theoretic Methods in Statistics , 1996 .

[12]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[13]  Noga Alon,et al.  The Probabilistic Method, Second Edition , 2004 .

[14]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[15]  Robert E. Larson,et al.  State increment dynamic programming , 1968 .

[16]  W. Reiher Hammersley, J. M., D. C. Handscomb: Monte Carlo Methods. Methuen & Co., London, and John Wiley & Sons, New York, 1964. VII + 178 S., Preis: 25 s , 1966 .

[17]  Cristiano Cervellera,et al.  Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization , 2006, Eur. J. Oper. Res..

[18]  D. Bertsekas Convergence of discretization procedures in dynamic programming , 1975 .

[19]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[20]  J. Tsitsiklis,et al.  An optimal multigrid algorithm for continuous state discrete time stochastic control , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[21]  Russell R. Barton,et al.  Ch. 7. A review of design and modeling in computer experiments , 2003 .

[22]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[23]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[26]  J. Tsitsiklis,et al.  An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .

[27]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[28]  Christine A. Shoemaker,et al.  Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming , 1999, Oper. Res..

[29]  P. Kitanidis,et al.  Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems , 1988 .

[30]  Leo Breiman,et al.  Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[31]  R. Bellman Dynamic programming. , 1957, Science.

[32]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.