Stable Optimal Control and Semicontractive Dynamic Programming

We consider discrete-time infinite horizon deterministic optimal control problems with nonnegative cost per stage, and a destination that is cost-free and absorbing. The classical linear-quadratic regulator problem is a special case. Our assumptions are very general, and allow the possibility that the optimal policy may not be stabilizing the system, e.g., may not reach the destination either asymptotically or in a finite number of steps. We introduce a new unifying notion of stable feedback policy, based on perturbation of the cost per stage, which in addition to implying convergence of the generated states to the destination, quantifies the speed of convergence. We consider the properties of two distinct cost functions: $\jstar$, the overall optimal, and $\hat J$, the restricted optimal over just the stable policies. Different classes of stable policies (with different speeds of convergence) may yield different values of $\hat J$. We show that for any class of stable policies, $\hat J$ is a solution of Bellman's equation, and we characterize the smallest and the largest solutions: they are $\jstar$, and $J^+$, the restricted optimal cost function over the class of (finitely) terminating policies. We also characterize the regions of convergence of various modified versions of value and policy iteration algorithms, as substitutes for the standard algorithms, which may not work in general.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[3]  J. Willems Least squares stationary optimal control and the algebraic Riccati equation , 1971 .

[4]  Vladimír Kucera,et al.  The discrete Riccati equation of optimal control , 1972, Kybernetika.

[5]  Vladimír Kucera,et al.  A review of the matrix Riccati equation , 1973, Kybernetika.

[6]  Rolf van Dawen,et al.  Negative Dynamic Programming , 1984 .

[7]  T. Mori,et al.  On the discrete Riccati equation , 1987 .

[8]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[11]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[12]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13]  A. ADoefaa,et al.  ? ? ? ? f ? ? ? ? ? , 2003 .

[14]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[15]  Frank L. Lewis,et al.  Guest Editorial: Special Issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Frank L. Lewis,et al.  Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[17]  Frank L. Lewis,et al.  Reinforcement Learning And Approximate Dynamic Programming For Feedback Control , 2016 .

[18]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[19]  Dimitri P. Bertsekas,et al.  Stochastic Shortest Path Problems Under Weak Conditions , 2013 .

[20]  Ali Heydari,et al.  Stabilizing Value Iteration with and without Approximation Errors , 2014, ArXiv.

[21]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Dimitri P. Bertsekas,et al.  Robust shortest path planning and semicontractive dynamic programming , 2016, ArXiv.

[23]  Dimitri P. Bertsekas,et al.  Regular Policies in Abstract Dynamic Programming , 2016, SIAM J. Optim..

[24]  Dimitri P. Bertsekas,et al.  Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Dimitri Bertsekas,et al.  Affine Monotonic and Risk-Sensitive Models in Dynamic Programming , 2016, IEEE Transactions on Automatic Control.

[26]  Tsuyoshi Murata,et al.  {m , 1934, ACML.