Finite-Horizon Discounted Optimal Control: Stability and Performance

Motivated by (approximate) dynamic programming and model predictive control problems, we analyse the stability of deterministic nonlinear discrete-time systems whose inputs minimize a discounted finite-horizon cost. We assume that the system satisfies stabilizability and detectability properties with respect to the stage cost. Then, a Lyapunov function for the closed-loop system is constructed and a uniform semiglobal stability property is ensured, where the adjustable parameters are both the discount factor and the horizon length, which corresponds to the number of iterations for dynamic programming algorithms like value iteration. Stronger stability properties such as global exponential stability are also provided by strengthening the initial assumptions. We give bounds on the discount factor and the horizon length under which stability holds and we provide conditions under which these are less conservative than the bounds of the literature for discounted infinite-horizon cost and undiscounted finite-horizon costs, respectively. In addition, we provide new relationships between the optimal value functions of the discounted, undiscounted, infinite-horizon and finite-horizon costs respectively, which are very different from those available in the approximate dynamic programming literature. These relationships rely on assumptions that are more likely to be satisfied in a control context. Finally, we investigate stability when only a near-optimal sequence of inputs for the discounted finite-horizon cost is available, covering approximate value iteration as a particular case.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  W. Marsden I and J , 2012 .

[3]  E. Gilbert,et al.  Optimal infinite-horizon feedback laws for a general class of constrained discrete-time systems: Stability and moving-horizon approximations , 1988 .

[4]  Andrew R. Teel,et al.  On the Robustness of KL-stability for Difference Inclusions: Smooth Discrete-Time Lyapunov Functions , 2005, SIAM J. Control. Optim..

[5]  Dragan Nesic,et al.  Stability analysis of discrete-time finite-horizon discounted optimal control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[6]  Romain Postoyan,et al.  Stability Analysis of Discrete-Time Infinite-Horizon Optimal Control With Discounted Cost , 2017, IEEE Transactions on Automatic Control.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Bart De Schutter,et al.  Approximate dynamic programming with a fuzzy parameterization , 2010, Autom..

[9]  Lars Grüne,et al.  Using Nonlinear Model Predictive Control for Dynamic Decision Problems in Economics , 2015 .

[10]  R. E. Kalman,et al.  Contributions to the Theory of Optimal Control , 1960 .

[11]  Ali Heydari,et al.  Stability Analysis of Optimal Adaptive Control Using Value Iteration With Approximation Errors , 2017, IEEE Transactions on Automatic Control.

[12]  Lars Grüne,et al.  On the relation between dissipativity and discounted dissipativity , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[13]  Csaba Szepesvári,et al.  Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[14]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[15]  Lars Grüne,et al.  On the Infinite Horizon Performance of Receding Horizon Controllers , 2008, IEEE Transactions on Automatic Control.

[16]  S. E. Tuna,et al.  Shorter horizons for model predictive control , 2006, 2006 American Control Conference.

[17]  E. Gilbert,et al.  An existence theorem for discrete-time infinite-horizon optimal control problems , 1985 .

[18]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[19]  Andrew R. Teel,et al.  Model predictive control: for want of a local control Lyapunov function, all is not lost , 2005, IEEE Transactions on Automatic Control.

[20]  Pierre-Jean Meyer,et al.  Invariance and symbolic control of cooperative systems for temperature regulation in intelligent buildings , 2015 .

[21]  Lars Grüne Computing stability and performance bounds for unconstrained NMPC schemes , 2007, 2007 46th IEEE Conference on Decision and Control.

[22]  Vladimir Gaitsgory,et al.  Stabilization of strictly dissipative discrete time systems with discounted optimal control , 2018, Autom..

[23]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[24]  Rémi Munos,et al.  Optimistic Planning of Deterministic Systems , 2008, EWRL.

[25]  Zhong-Ping Jiang,et al.  Input-to-state stability for discrete-time nonlinear systems , 1999 .

[26]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.