Continuous-Time Time-Varying Policy Iteration

A novel policy iteration algorithm, called the continuous-time time-varying (CTTV) policy iteration algorithm, is presented in this paper to obtain the optimal control laws for infinite horizon CTTV nonlinear systems. The adaptive dynamic programming (ADP) technique is utilized to obtain the iterative control laws for the optimization of the performance index function. The properties of the CTTV policy iteration algorithm are analyzed. Monotonicity, convergence, and optimality of the iterative value function have been analyzed, and the iterative value function can be proven to monotonically converge to the optimal solution of the Hamilton–Jacobi–Bellman (HJB) equation. Furthermore, the iterative control law is guaranteed to be admissible to stabilize the nonlinear systems. In the implementation of the presented CTTV policy algorithm, the approximate iterative control laws and iterative value function are obtained by neural networks. Finally, the numerical results are given to verify the effectiveness of the presented method.

[1]  Chaomin Luo,et al.  Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms , 2017, IEEE Transactions on Cybernetics.

[2]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[3]  Huaguang Zhang,et al.  Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[4]  Youxian Sun,et al.  Consensus Control of Nonlinear Multiagent Systems With Time-Varying State Constraints , 2017, IEEE Transactions on Cybernetics.

[5]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[6]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[7]  Qinglai Wei,et al.  Discrete-Time Impulsive Adaptive Dynamic Programming , 2020, IEEE Transactions on Cybernetics.

[8]  Tao Feng,et al.  Distributed Optimal Consensus Control for Nonlinear Multiagent System With Unknown Dynamic , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Haibo He,et al.  Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming , 2019, IEEE Transactions on Cybernetics.

[10]  Sarika Khushalani Solanki,et al.  Adaptive Critic Design-Based Dynamic Stochastic Optimal Control Design for a Microgrid With Multiple Renewable Resources , 2015, IEEE Transactions on Smart Grid.

[11]  Warren B. Powell,et al.  A New Optimal Stepsize for Approximate Dynamic Programming , 2014, IEEE Transactions on Automatic Control.

[12]  Bin Jiang,et al.  Online Adaptive Policy Learning Algorithm for $H_{\infty }$ State Feedback Control of Unknown Affine Nonlinear Discrete-Time Systems , 2014, IEEE Transactions on Cybernetics.

[13]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[14]  Haibo He,et al.  Novel iterative neural dynamic programming for data-based approximate optimal control design , 2017, Autom..

[15]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[16]  Zhong-Ping Jiang,et al.  Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming , 2016, Autom..

[17]  Derong Liu,et al.  Data-Based Optimal Control for Weakly Coupled Nonlinear Systems Using Policy Iteration , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18]  Warren E. Dixon,et al.  Model-based reinforcement learning for approximate optimal regulation , 2016, Autom..

[19]  Derong Liu,et al.  Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming , 2017, IEEE Transactions on Cybernetics.

[20]  Haibo He,et al.  Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints , 2017, IEEE Trans. Neural Networks Learn. Syst..

[21]  Yu Liu,et al.  Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming , 2017, IEEE/CAA Journal of Automatica Sinica.

[22]  Frank L. Lewis,et al.  Operational Control of Mineral Grinding Processes Using Adaptive Dynamic Programming and Reference Governor , 2019, IEEE Transactions on Industrial Informatics.

[23]  Jae Young Lee,et al.  Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[25]  Huaguang Zhang,et al.  Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method , 2017, IEEE Transactions on Industrial Electronics.

[26]  Huai-Ning Wu,et al.  Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control , 2017, IEEE Transactions on Cybernetics.

[27]  Frank L. Lewis,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[28]  Haibo He,et al.  Data-Driven Distributed Output Consensus Control for Partially Observable Multiagent Systems , 2019, IEEE Transactions on Cybernetics.

[29]  Huaguang Zhang,et al.  Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems With Unknown Uncertainty Using Adaptive Critic Design , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Derong Liu,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Huaguang Zhang,et al.  Optimal Output Regulation for Heterogeneous Multiagent Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Dimitri P. Bertsekas,et al.  Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Han-Xiong Li,et al.  Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Derong Liu,et al.  Multibattery Optimal Coordination Control for Home Energy Management Systems via Distributed Iterative Adaptive Dynamic Programming , 2015, IEEE Transactions on Industrial Electronics.

[35]  Ali Heydari,et al.  Optimal Switching of DC–DC Power Converters Using Approximate Dynamic Programming , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Yang Liu,et al.  Data-Based Adaptive Dynamic Programming for a Class of Discrete-Time Systems With Multiple Delays , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Qinglai Wei,et al.  Adaptive Dynamic Programming-Based Optimal Control Scheme for Energy Storage Systems With Solar Renewable Energy , 2017, IEEE Transactions on Industrial Electronics.

[38]  Derong Liu,et al.  Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[39]  He Jiang,et al.  Neural-Network-Based Robust Control Schemes for Nonlinear Multiplayer Systems With Uncertainties via Adaptive Dynamic Programming , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40]  Frank L. Lewis,et al.  Mixed Iterative Adaptive Dynamic Programming for Optimal Battery Energy Control in Smart Residential Microgrids , 2017, IEEE Transactions on Industrial Electronics.

[41]  Zongli Lin,et al.  Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback , 2020, IEEE Transactions on Cybernetics.

[42]  Frank L. Lewis,et al.  Error-Tolerant Iterative Adaptive Dynamic Programming for Optimal Renewable Home Energy Scheduling and Battery Management , 2017, IEEE Transactions on Industrial Electronics.

[43]  Derong Liu,et al.  Event-Triggered Optimal Neuro-Controller Design With Reinforcement Learning for Unknown Nonlinear Systems , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[44]  Xiong Yang,et al.  Adaptive Critic Designs for Event-Triggered Robust Control of Nonlinear Systems With Unknown Dynamics , 2019, IEEE Transactions on Cybernetics.

[45]  Haibo He,et al.  Adaptive Dynamic Programming for Decentralized Stabilization of Uncertain Nonlinear Large-Scale Systems With Mismatched Interconnections , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[46]  Ali Heydari,et al.  Optimal Codesign of Control Input and Triggering Instants for Networked Control Systems Using Adaptive Dynamic Programming , 2019, IEEE Transactions on Industrial Electronics.

[47]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[49]  Naresh Malla,et al.  Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems , 2019, IEEE Transactions on Cybernetics.

[50]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[51]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[52]  Qinglai Wei,et al.  Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[53]  Haibo He,et al.  Data-Driven Finite-Horizon Approximate Optimal Control for Discrete-Time Nonlinear Systems Using Iterative HDP Approach , 2018, IEEE Transactions on Cybernetics.

[54]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[55]  Derong Liu,et al.  Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[56]  Qichao Zhang,et al.  Policy Iteration for $H_\infty $ Optimal Control of Polynomial Nonlinear Systems via Sum of Squares Programming , 2018, IEEE Transactions on Cybernetics.

[57]  Haibo He,et al.  Model-Free Adaptive Control for Unknown Nonlinear Zero-Sum Differential Game , 2018, IEEE Transactions on Cybernetics.

[58]  Robert Kozma,et al.  Complete stability analysis of a heuristic approximate dynamic programming control design , 2015, Autom..

[59]  Bo Zhao,et al.  Decentralized Control for Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections via Policy Iteration , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[60]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .