Invariant Adaptive Dynamic Programming for Discrete-Time Optimal Control

For systems that can only be locally stabilized, control laws and their effective regions are both important. In this paper, invariant policy iteration is proposed to solve the optimal control of discrete-time systems. At each iteration, a given policy is evaluated in its invariantly admissible region, and a new policy and a new region are updated for the next iteration. Theoretical analysis shows the method is regionally convergent to the optimal value and the optimal policy. Combined with sum-of-squares polynomials, the method is able to achieve the near-optimal control of a class of discrete-time systems. An invariant adaptive dynamic programming algorithm is developed to extend the method to scenarios where system dynamics is not available. Online data are utilized to learn the near-optimal policy and the invariantly admissible region. Simulated experiments verify the effectiveness of our method.

[1]  Huaguang Zhang,et al.  Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[2]  Yu Jiang,et al.  Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Randal W. Bea Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .

[4]  Pablo A. Parrilo,et al.  SOSTOOLS Version 3.00 Sum of Squares Optimization Toolbox for MATLAB , 2013, ArXiv.

[5]  Frank L. Lewis,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[6]  L. Magni,et al.  Robust receding - horizon control of nonlinear systems with state dependent uncertainties: An input-to-state stability approach , 2008, 2008 American Control Conference.

[7]  Pablo A. Parrilo,et al.  Semidefinite programming relaxations for semialgebraic problems , 2003, Math. Program..

[8]  Zhong-Ping Jiang,et al.  Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[9]  Doreen Eichel,et al.  Adaptive Dynamic Programming For Control Algorithms And Stability , 2016 .

[10]  J. P. Rincón-Zapatero,et al.  Existence and Uniqueness of Solutions to the Bellman Equation in the Unbounded Case , 2003 .

[11]  Derong Liu,et al.  Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[13]  Minyue Fu,et al.  Guaranteed cost analysis and control for a class of uncertain nonlinear discrete-time systems , 2003, Proceedings of the 2003 American Control Conference, 2003..

[14]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Qichao Zhang,et al.  Event-Triggered $H_\infty $ Control for Continuous-Time Nonlinear System via Concurrent Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Andrea Bacciotti,et al.  Some remarks about stability of nonlinear discrete-time control systems , 2001 .

[17]  Qichao Zhang,et al.  Policy Iteration for $H_\infty $ Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming , 2018, IEEE Transactions on Cybernetics.

[18]  Haibo He,et al.  Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Industrial Electronics.

[19]  Jae Young Lee,et al.  Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Dimitri P. Bertsekas,et al.  Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Dongbin Zhao,et al.  Adaptive Optimal Control of Heterogeneous CACC System With Uncertain Dynamics , 2019, IEEE Transactions on Control Systems Technology.

[22]  Dongbin Zhao,et al.  Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[24]  Dongbin Zhao,et al.  Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics , 2016 .

[25]  Morten Hovd,et al.  Control design and analysis for discrete time bilinear systems using Sum of Squares methods , 2014, 53rd IEEE Conference on Decision and Control.

[26]  Huai-Ning Wu,et al.  Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control , 2017, IEEE Transactions on Cybernetics.

[27]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[28]  Kazuo Tanaka,et al.  Guaranteed cost control of polynomial fuzzy systems via a sum of squares approach , 2007, 2007 46th IEEE Conference on Decision and Control.

[29]  Qichao Zhang,et al.  Event-Triggered H ∞ Control for Continuous-Time Nonlinear System , 2015, ISNN.

[30]  A. Papachristodoulou,et al.  Nonlinear control synthesis by sum of squares optimization: a Lyapunov-based approach , 2004, 2004 5th Asian Control Conference (IEEE Cat. No.04EX904).

[31]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Derong Liu,et al.  Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming , 2017, IEEE Transactions on Cybernetics.

[33]  Derong Liu,et al.  Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Jun Xu,et al.  Synthesis of Discrete-time Nonlinear Systems: A SOS Approach , 2007, 2007 American Control Conference.

[35]  Hongyi Li,et al.  Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems , 2018, Neural Networks.

[36]  Derong Liu,et al.  Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems , 2015, Neurocomputing.

[37]  Haibo He,et al.  Adaptive Dynamic Programming for Decentralized Stabilization of Uncertain Nonlinear Large-Scale Systems With Mismatched Interconnections , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[38]  Yang Liu,et al.  Data-Based Adaptive Dynamic Programming for a Class of Discrete-Time Systems With Multiple Delays , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[39]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[40]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[41]  Dongbin Zhao,et al.  Control-Limited Adaptive Dynamic Programming for Multi-Battery Energy Storage Systems , 2019, IEEE Transactions on Smart Grid.

[42]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Haibo He,et al.  Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems , 2015, Cognitive Computation.

[44]  Existence and Uniqueness of a Fixed Point for the Bellman Operator in Deterministic Dynamic Programming , 2011 .

[45]  Derong Liu,et al.  Adaptive $Q$ -Learning for Data-Based Optimal Output Regulation With Experience Replay , 2018, IEEE Transactions on Cybernetics.