Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming

In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general assumptions, we establish the uniqueness of the solution of Bellman’s equation, and we provide convergence results for value and policy iterations.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[4]  Z. Rekasius,et al.  Suboptimal design of intentionally nonlinear controllers , 1964 .

[5]  R. P. D. L. Barrière Optimal Control Theory , 1967 .

[6]  David Blackwell,et al.  Positive dynamic programming , 1967 .

[7]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[8]  D. Bertsekas Control of uncertain systems with a set-membership description of the uncertainty , 1971 .

[9]  D. Bertsekas,et al.  On the minimax reachability of target sets and target tubes , 1971 .

[10]  D. Bertsekas Infinite time reachability of state-space regions by using feedback control , 1972 .

[11]  Manfred SchÄl,et al.  Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal , 1975 .

[12]  D. Bertsekas Monotone mappings in dynamic programming , 1975, 1975 IEEE Conference on Decision and Control including the 14th Symposium on Adaptive Processes.

[13]  D. Bertsekas Monotone Mappings with Application in Dynamic Programming , 1977 .

[14]  George N. Saridis,et al.  An Approximation Theory of Optimal Control for Trainable Manipulators , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Rolf van Dawen,et al.  Negative Dynamic Programming , 1984 .

[16]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[20]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[21]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[22]  D. Bertsekas,et al.  Stochastic Shortest Path Games , 1999 .

[23]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[24]  Jennie Si,et al.  Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .

[25]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[26]  A. Rantzer Relaxed dynamic programming in switching systems , 2006 .

[27]  Emanuel Todorov,et al.  Optimal Control Theory , 2006 .

[28]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[29]  Frank L. Lewis,et al.  Guest Editorial: Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[30]  Nicolas Privault The Discrete Time Case , 2009 .

[31]  Munther A. Dahleh,et al.  Value Iteration for (Switched) Homogeneous Systems , 2009, IEEE Transactions on Automatic Control.

[32]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[33]  Zhong-Ping Jiang,et al.  Robust adaptive dynamic programming for linear and nonlinear systems: An overview , 2013, Eur. J. Control.

[34]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[35]  Dimitri P. Bertsekas,et al.  Abstract Dynamic Programming , 2013 .

[36]  Dimitri P. Bertsekas,et al.  Stochastic Shortest Path Problems Under Weak Conditions , 2013 .

[37]  Ali Heydari,et al.  Revisiting Approximate Dynamic Programming and its Convergence , 2014, IEEE Transactions on Cybernetics.

[38]  Ali Heydari,et al.  Stabilizing Value Iteration with and without Approximation Errors , 2014, ArXiv.

[39]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[40]  W. Dixon Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2014 .

[41]  Derong Liu,et al.  Finite-Approximation-Error-Based Discrete-Time Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Cybernetics.

[42]  Dimitri P. Bertsekas,et al.  A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies , 2013, Math. Oper. Res..

[43]  Dimitri P. Bertsekas,et al.  Robust shortest path planning and semicontractive dynamic programming , 2016, ArXiv.

[44]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[45]  Dimitri P. Bertsekas,et al.  Regular Policies in Abstract Dynamic Programming , 2016, SIAM J. Optim..

[46]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[47]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.