Adaptive Dynamic Programming for Control: A Survey and Recent Advances

This article reviews the recent development of adaptive dynamic programming (ADP) with applications in control. First, its applications in optimal regulation are introduced, and some skilled and efficient algorithms are presented. Next, the use of ADP to solve game problems, mainly nonzero-sum game problems, is elaborated. It is followed by applications in large-scale systems. Note that although the functions presented in this article are based on continuous-time systems, various applications of ADP in discrete-time systems are also analyzed. Moreover, in each section, not only some existing techniques are discussed, but also possible directions for future work are pointed out. Finally, some overall prospects for the future are given, followed by conclusions of this article. Through a comprehensive and complete investigation of its applications in many existing fields, this article fully demonstrates that the ADP intelligent control method is promising in today’s artificial intelligence era. Furthermore, it also plays a significant role in promoting economic and social development.

[1]  Derong Liu,et al.  Event-Triggered Optimal Neuro-Controller Design With Reinforcement Learning for Unknown Nonlinear Systems , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Luy Nguyen Tan,et al.  Event-Triggered Distributed H∞ Constrained Control of Physically Interconnected Large-Scale Partially Unknown Strict-Feedback Systems , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[4]  Van,et al.  L2-Gain Analysis of Nonlinear Systems and Nonlinear State Feedback H∞ Control , 2004 .

[5]  Derong Liu,et al.  On Mixed Data and Event Driven Design for Adaptive-Critic-Based Nonlinear $H_{\infty}$ Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Chao Li,et al.  Neural-network-based decentralized control of continuous-time nonlinear interconnected systems with unknown dynamics , 2015, Neurocomputing.

[7]  Haibo He,et al.  Adaptive Event-Triggered Control Based on Heuristic Dynamic Programming for Nonlinear Discrete-Time Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[9]  Derong Liu,et al.  Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Derong Liu,et al.  Adaptive $Q$ -Learning for Data-Based Optimal Output Regulation With Experience Replay , 2018, IEEE Transactions on Cybernetics.

[11]  Frank L. Lewis,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12]  Haibo He,et al.  Event-Driven Nonlinear Discounted Optimal Regulation Involving a Power System Application , 2017, IEEE Transactions on Industrial Electronics.

[13]  Kyriakos G. Vamvoudakis,et al.  Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems , 2015, Autom..

[14]  Xiong Yang,et al.  Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints , 2014, Int. J. Control.

[15]  Frank L. Lewis,et al.  Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Shengwei Mei,et al.  Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Haibo He,et al.  Optimal Control for Unknown Discrete-Time Nonlinear Markov Jump Systems Using Adaptive Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Derong Liu,et al.  Observer based adaptive dynamic programming for fault tolerant control of a class of nonlinear systems , 2017, Inf. Sci..

[19]  Derong Liu,et al.  Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming , 2014 .

[20]  Zhong-Ping Jiang,et al.  Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[21]  John N. Tsitsiklis,et al.  Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[22]  Richard J. Zeckhauser,et al.  The Optimal Consumption of Depletable Natural Resources , 1975 .

[23]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[24]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[25]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[26]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[27]  Bo Zhao,et al.  Deterministic policy gradient adaptive dynamic programming for model-free optimal control , 2020, Neurocomputing.

[28]  Radhakant Padhi,et al.  A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems , 2006, Neural Networks.

[29]  Roberto A. Santiago,et al.  Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.

[30]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[31]  F. Lewis,et al.  Online adaptive algorithm for optimal control with integral reinforcement learning , 2014 .

[32]  Derong Liu,et al.  Observer-critic structure-based adaptive dynamic programming for decentralised tracking control of unknown large-scale nonlinear systems , 2017, Int. J. Syst. Sci..

[33]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[34]  Huaguang Zhang,et al.  Event-Driven Guaranteed Cost Control Design for Nonlinear Systems With Actuator Faults via Reinforcement Learning Algorithm , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[35]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[36]  Tingwen Huang,et al.  Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems , 2017, Inf. Sci..

[37]  Haibo He,et al.  Intelligent Optimal Control With Critic Learning for a Nonlinear Overhead Crane System , 2018, IEEE Transactions on Industrial Informatics.

[38]  Frank L. Lewis,et al.  Special Issue on Deep Reinforcement Learning and Adaptive Dynamic Programming , 2018, IEEE Trans. Multimedia.

[39]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[40]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[41]  Derong Liu,et al.  Event-triggered constrained control with DHP implementation for nonaffine discrete-time systems , 2020, Inf. Sci..

[42]  Derong Liu,et al.  Value Iteration Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems , 2016, IEEE Transactions on Cybernetics.

[43]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Haibo He,et al.  Novel iterative neural dynamic programming for data-based approximate optimal control design , 2017, Autom..

[45]  Huaguang Zhang,et al.  Decentralized adaptive tracking control scheme for nonlinear large-scale interconnected systems via adaptive dynamic programming , 2017, Neurocomputing.

[46]  Haibo He,et al.  An Event-Triggered ADP Control Approach for Continuous-Time System With Unknown Internal States , 2017, IEEE Transactions on Cybernetics.

[47]  Derong Liu,et al.  Finite horizon optimal tracking control of partially unknown linear continuous-time systems using policy iteration , 2015 .

[48]  Derong Liu,et al.  Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors , 2013, Neurocomputing.

[49]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[50]  Frank L. Lewis,et al.  Reinforcement learning and optimal adaptive control: An overview and implementation examples , 2012, Annu. Rev. Control..

[51]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[52]  Frank L. Lewis,et al.  Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems , 2015, IEEE Transactions on Cybernetics.

[53]  M. Aoki,et al.  On optimal and suboptimal policies in the choice of control forces for final-value systems , 1960 .

[54]  Haibo He,et al.  Adaptive Critic Learning and Experience Replay for Decentralized Event-Triggered Control of Nonlinear Interconnected Systems , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[55]  Lingxiao Wang,et al.  Optimal Elevator Group Control via Deep Asynchronous Actor–Critic Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[56]  E. Yaz,et al.  Linear optimal control, H2 and H∞ methods, by Jeffrey B. Burl, Addison Wesley Longman, Inc. Menlo Park, CA, 1999 , 2000 .

[57]  Derong Liu,et al.  An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs , 2013, Inf. Sci..

[58]  Zhong-Ping Jiang,et al.  Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems , 2016, IEEE Transactions on Automatic Control.

[59]  Qinmin Yang,et al.  Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[60]  Haibo He,et al.  Event-Driven Adaptive Robust Control of Nonlinear Systems With Uncertainties Through NDP Strategy , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[61]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[62]  Shan Xue,et al.  Event-Triggered Adaptive Dynamic Programming for Unmatched Uncertain Nonlinear Continuous-Time Systems , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[63]  Dimitri P. Bertsekas,et al.  Missile defense and interceptor allocation by neuro-dynamic programming , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[64]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[65]  Frank L. Lewis,et al.  Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes , 2017, IEEE Transactions on Cybernetics.

[66]  Tianyou Chai,et al.  Optimal Output Regulation of Linear Discrete-Time Systems With Unknown Dynamics Using Reinforcement Learning , 2020, IEEE Transactions on Cybernetics.

[67]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[68]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[69]  Derong Liu,et al.  Infinite Horizon Self-Learning Optimal Control of Nonaffine Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[70]  Ruey-Wen Liu,et al.  Construction of Suboptimal Control Sequences , 1967 .

[71]  A. Astolfi Disturbance Attenuation and H,-Control Via Measurement Feedback in , 1992 .

[72]  Frank L. Lewis,et al.  Output regulation of unknown linear systems using average cost reinforcement learning , 2019, Autom..

[73]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[74]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[75]  Dong Yue,et al.  Data-Driven Distributed Optimal Consensus Control for Unknown Multiagent Systems With Input-Delay , 2019, IEEE Transactions on Cybernetics.

[76]  Derong Liu,et al.  Adaptive dynamic programming-based decentralised control for large-scale nonlinear systems subject to mismatched interconnections with unknown time-delay , 2020, Int. J. Syst. Sci..

[77]  Aiguo Song,et al.  Iterative GDHP-based approximate optimal tracking control for a class of discrete-time nonlinear systems , 2016, Neurocomputing.

[78]  Derong Liu,et al.  Decentralized guaranteed cost control of interconnected systems with uncertainties: A learning-based optimal control strategy , 2016, Neurocomputing.

[79]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[80]  Derong Liu,et al.  Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints , 2013 .

[81]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[82]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[83]  Derong Liu,et al.  A Novel Dual Iterative $Q$-Learning Method for Optimal Battery Management in Smart Residential Environments , 2015, IEEE Transactions on Industrial Electronics.

[84]  Qichao Zhang,et al.  Deep Reinforcement Learning-Based Automatic Exploration for Navigation in Unknown Environment , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[85]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[86]  Dongbin Zhao,et al.  Comprehensive comparison of online ADP algorithms for continuous-time optimal control , 2017, Artificial Intelligence Review.

[87]  Qinglai Wei,et al.  Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[88]  Yanhong Luo,et al.  Data-driven approximate optimal tracking control schemes for unknown non-affine non-linear multi-player systems via adaptive dynamic programming , 2017 .

[89]  Derong Liu,et al.  An iterative ϵ-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state , 2012, Neural Networks.

[90]  Junmin Li,et al.  Decentralized Output-Feedback Neural Control for Systems With Unknown Interconnections , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[91]  Derong Liu,et al.  Event-Triggered Decentralized Tracking Control of Modular Reconfigurable Robots Through Adaptive Dynamic Programming , 2020, IEEE Transactions on Industrial Electronics.

[92]  Abhijit Gosavi,et al.  Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[93]  Haibo He,et al.  Adaptive Dynamic Programming for Decentralized Stabilization of Uncertain Nonlinear Large-Scale Systems With Mismatched Interconnections , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[94]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[95]  Frank L. Lewis,et al.  Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online , 2017, IEEE Control Systems.

[96]  J. Murray,et al.  The Adaptive Dynamic Programming Theorem , 2003 .

[97]  Derong Liu,et al.  Direct Neural Dynamic Programming , 2004 .

[98]  Paulo Tabuada,et al.  Event-Triggered Real-Time Scheduling of Stabilizing Control Tasks , 2007, IEEE Transactions on Automatic Control.

[99]  Derong Liu,et al.  Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[100]  Yang Xiong,et al.  Adaptive Dynamic Programming with Applications in Optimal Control , 2017 .

[101]  Derong Liu,et al.  Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach , 2016, Soft Comput..

[102]  Bo Zhao,et al.  Local joint information based active fault tolerant control for reconfigurable manipulator , 2014 .

[103]  Derong Liu,et al.  Optimal control for discrete-time affine non-linear systems using general value iteration , 2012 .

[104]  Haibo He,et al.  A Theoretical Foundation of Goal Representation Heuristic Dynamic Programming , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[105]  Sarangapani Jagannathan,et al.  Neural Network-Based Optimal Adaptive Output Feedback Control of a Helicopter UAV , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[106]  Haibo He,et al.  Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems , 2017, IEEE Transactions on Cybernetics.

[107]  Derong Liu,et al.  Output Tracking Control Based on Adaptive Dynamic Programming With Multistep Policy Evaluation , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[108]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[109]  Warren E. Dixon,et al.  Concurrent learning-based approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games , 2013, IEEE/CAA Journal of Automatica Sinica.

[110]  R. C. Durbeck,et al.  An approximation technique for suboptimal control , 1965 .

[111]  Haibo He,et al.  Adaptive critic designs for optimal control of uncertain nonlinear systems with unmatched interconnections , 2018, Neural Networks.

[112]  Dongbin Zhao,et al.  Deep Reinforcement Learning With Visual Attention for Vehicle Classification , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[113]  Frank L. Lewis,et al.  Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics , 2019, IEEE Transactions on Automatic Control.

[114]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[115]  Frank L. Lewis,et al.  Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[116]  Frank L. Lewis,et al.  Optimal control using adaptive resonance theory and Q-learning , 2019, Neurocomputing.

[117]  K. Vamvoudakis Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems , 2014, IEEE/CAA Journal of Automatica Sinica.

[118]  Derong Liu,et al.  Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm , 2014, Neurocomputing.

[119]  Haibo He,et al.  Data-Driven Distributed Output Consensus Control for Partially Observable Multiagent Systems , 2019, IEEE Transactions on Cybernetics.

[120]  Bo Zhao,et al.  Model-free Adaptive Dynamic Programming Based Near-optimal Decentralized Tracking Control of Reconfigurable Manipulators , 2018 .

[121]  Doreen Eichel,et al.  Adaptive Dynamic Programming For Control Algorithms And Stability , 2016 .

[122]  Derong Liu,et al.  Self-tuned local feedback gain based decentralized fault tolerant control for a class of large-scale nonlinear systems , 2017, Neurocomputing.

[123]  Derong Liu,et al.  Error Bound Analysis of $Q$ -Function for Discounted Optimal Control Problems With Policy Iteration , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[124]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[125]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[126]  Yu Liu,et al.  Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming , 2017, IEEE/CAA Journal of Automatica Sinica.

[127]  Frank L. Lewis,et al.  Dynamic Multiobjective Control for Continuous-Time Systems Using Reinforcement Learning , 2019, IEEE Transactions on Automatic Control.

[128]  Derong Liu,et al.  Online fault compensation control based on policy iteration algorithm for a class of affine non-linear systems with actuator failures , 2016 .

[129]  Mohammad-Bagher Naghibi-Sistani,et al.  Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems , 2014, International Journal of Machine Learning and Cybernetics.

[130]  Huai-Ning Wu,et al.  Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control , 2017, IEEE Transactions on Cybernetics.

[131]  Derong Liu,et al.  Discrete-Time Optimal Control via Local Policy Iteration Adaptive Dynamic Programming , 2017, IEEE Transactions on Cybernetics.

[132]  A. Rantzer Relaxed dynamic programming in switching systems , 2006 .

[133]  Derong Liu,et al.  Event-Triggered Adaptive Dynamic Programming for Zero-Sum Game of Partially Unknown Continuous-Time Nonlinear Systems , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[134]  Derong Liu,et al.  Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Admissibility and Termination Analysis , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[135]  Ding Wang,et al.  Improved value iteration for neural-network-based stochastic optimal control design , 2020, Neural Networks.

[136]  Frank L. Lewis,et al.  Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis , 2017, IEEE Transactions on Cybernetics.

[137]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[138]  Jae Young Lee,et al.  On integral generalized policy iteration for continuous-time linear quadratic regulations , 2014, Autom..

[139]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[140]  Bo Zhao,et al.  Decentralized Control for Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections via Policy Iteration , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[141]  Derong Liu,et al.  Generalized Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[142]  Dan Ye,et al.  Decentralized reliable guaranteed cost control for large-scale nonlinear systems using actor-critic network , 2018, Neurocomputing.

[143]  Jennie Si,et al.  ADP: Goals, Opportunities and Principles , 2004 .

[144]  Zhong-Ping Jiang,et al.  Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[145]  Derong Liu,et al.  Integral reinforcement learning based event-triggered control with input saturation , 2020, Neural Networks.

[146]  Ali Heydari,et al.  Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[147]  Huaguang Zhang,et al.  A Neural Dynamic Programming Approach F or Learning Control O f Failure Avoidance Problems , 2005 .

[148]  Zhong-Ping Jiang,et al.  Adaptive Optimal Output Regulation of Time-Delay Systems via Measurement Feedback , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[149]  Haibo He,et al.  Adaptive Critic Nonlinear Robust Control: A Survey , 2017, IEEE Transactions on Cybernetics.

[150]  Nader Meskin,et al.  Event-Triggered Suboptimal Tracking Controller Design for a Class of Nonlinear Discrete-Time Systems , 2017, IEEE Transactions on Industrial Electronics.

[151]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[152]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[153]  Frank L. Lewis,et al.  Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems With Disturbances , 2016, IEEE Transactions on Cybernetics.

[154]  Tingwen Huang,et al.  Optimal Output Regulation for Model-Free Quanser Helicopter With Multistep Q-Learning , 2018, IEEE Transactions on Industrial Electronics.

[155]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[156]  Wei Wang,et al.  Data-driven adaptive dynamic programming for partially observable nonzero-sum games via Q-learning method , 2019, Int. J. Syst. Sci..

[157]  Wei Xing Zheng,et al.  Optimal Synchronization Control of Multiagent Systems With Input Saturation via Off-Policy Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[158]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[159]  Yang Liu,et al.  Value Iteration-Based H∞ Controller Design for Continuous-Time Nonlinear Systems Subject to Input Constraints , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[160]  Yuanchun Li,et al.  Adaptive dynamic programming-based stabilization of nonlinear systems with unknown actuator saturation , 2018 .

[161]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[162]  Haibo He,et al.  Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[163]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[164]  Derong Liu,et al.  Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[165]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[166]  Frank L. Lewis,et al.  H∞ Control of Nonaffine Aerial Systems Using Off-policy Reinforcement Learning , 2016, Unmanned Syst..

[167]  Frank L. Lewis,et al.  Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data , 2015, IEEE Transactions on Cybernetics.

[168]  Avimanyu Sahoo,et al.  Event-triggered Control of N-player Nonlinear Systems Using Nonzero-Sum Games , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[169]  Bo Lincoln,et al.  Relaxing dynamic programming , 2006, IEEE Transactions on Automatic Control.

[170]  Shan Xue,et al.  Constrained Event-Triggered H∞ Control Based on Adaptive Dynamic Programming With Concurrent Learning , 2022, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[171]  Sarangapani Jagannathan,et al.  Decentralized Optimal Control of a Class of Interconnected Nonlinear Discrete-Time Systems by Using Online Hamilton-Jacobi-Bellman Formulation , 2011, IEEE Transactions on Neural Networks.

[172]  Haibo He,et al.  Intelligent load frequency controller using GrADP for island smart grid with electric vehicles and renewable resources , 2015, Neurocomputing.

[173]  Tingwen Huang,et al.  Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[174]  Marcelo Godoy Simões,et al.  Neural dynamic programming based online controller with a novel trim approach , 2005 .

[175]  Derong Liu,et al.  Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique , 2013, Neurocomputing.

[176]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[177]  Tingwen Huang,et al.  Reinforcement learning solution for HJB equation arising in constrained optimal control problem , 2015, Neural Networks.

[178]  Zhong-Ping Jiang,et al.  Adaptive and optimal output feedback control of linear systems: An adaptive dynamic programming approach , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[179]  Ali Saberi,et al.  On optimality of decentralized control for a class of nonlinear interconnected systems , 1988, Autom..

[180]  Derong Liu,et al.  Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning , 2014, Neural Networks.

[181]  Girish Chowdhary,et al.  Concurrent learning for convergence in adaptive control without persistency of excitation , 2010, 49th IEEE Conference on Decision and Control (CDC).

[182]  Jingliang Sun,et al.  Decentralised zero-sum differential game for a class of large-scale interconnected systems via adaptive dynamic programming , 2019, Int. J. Control.

[183]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[184]  Derong Liu,et al.  Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[185]  Frank L. Lewis,et al.  Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning , 2016, Autom..

[186]  Chaomin Luo,et al.  Reinforcement Learning-Based Optimal Stabilization for Unknown Nonlinear Systems Subject to Inputs With Uncertain Constraints , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[187]  Haibo He,et al.  Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Industrial Electronics.

[188]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[189]  Derong Liu,et al.  Adaptive dynamic programming based event-triggered control for unknown continuous-time nonlinear systems with input constraints , 2020, Neurocomputing.

[190]  Yang Liu,et al.  Neural network-based H∞ sliding mode control for nonlinear systems with actuator faults and unmatched disturbances , 2018, Neurocomputing.

[191]  Derong Liu,et al.  A neural-network-based online optimal control approach for nonlinear robust decentralized stabilization , 2016, Soft Comput..

[192]  Derong Liu,et al.  Error Bounds of Adaptive Dynamic Programming Algorithms for Solving Undiscounted Optimal Control Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[193]  L. Joseph Thomas,et al.  Inventory Control with Probabilistic Demand and Periodic Withdrawals , 1972 .

[194]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[195]  Frank L. Lewis,et al.  Adaptive Suboptimal Output-Feedback Control for Linear Systems Using Integral Reinforcement Learning , 2015, IEEE Transactions on Control Systems Technology.

[196]  Paul J. Werbos,et al.  Foreword: ADP - The Key Direction for Future Research in Intelligent Control and Understanding Brain Intelligence , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[197]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[198]  He Jiang,et al.  Neural-Network-Based Robust Control Schemes for Nonlinear Multiplayer Systems With Uncertainties via Adaptive Dynamic Programming , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[199]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[200]  Alessandro Astolfi,et al.  Dynamic Approximate Solutions of the HJ Inequality and of the HJB Equation for Input-Affine Nonlinear Systems , 2012, IEEE Transactions on Automatic Control.

[201]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[202]  K. Vamvoudakis,et al.  Event‐triggered optimal tracking control of nonlinear systems , 2017 .

[203]  Xinxing Li,et al.  Policy iteration based Q-learning for linear nonzero-sum quadratic differential games , 2019, Science China Information Sciences.

[204]  Tingwen Huang,et al.  Balancing Value Iteration and Policy Iteration for Discrete-Time Control , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[205]  Cesare Alippi,et al.  Data-based fault tolerant control for affine nonlinear systems through particle swarm optimized neural networks , 2020, IEEE/CAA Journal of Automatica Sinica.

[206]  Tamer Başar,et al.  H1-Optimal Control and Related Minimax Design Problems , 1995 .

[207]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[208]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[209]  Frank L. Lewis,et al.  Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[210]  Derong Liu,et al.  Adaptive Dynamic Programming for Optimal Tracking Control of Unknown Nonlinear Systems With Application to Coal Gasification , 2014, IEEE Transactions on Automation Science and Engineering.

[211]  Derong Liu,et al.  Data-Based Optimal Control for Weakly Coupled Nonlinear Systems Using Policy Iteration , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[212]  P. Olver Nonlinear Systems , 2013 .

[213]  Derong Liu,et al.  Data-Based Adaptive Critic Designs for Nonlinear Robust Optimal Control With Uncertain Dynamics , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[214]  Derong Liu,et al.  Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[215]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[216]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[217]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[218]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[219]  Haibo He,et al.  Goal Representation Heuristic Dynamic Programming on Maze Navigation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[220]  Haibo He,et al.  Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints , 2017, IEEE Trans. Neural Networks Learn. Syst..

[221]  Hao Xu,et al.  Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses , 2012, Autom..

[222]  Avimanyu Sahoo,et al.  Near Optimal Event-Triggered Control of Nonlinear Discrete-Time Systems Using Neurodynamic Programming , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[223]  Kun Zhang,et al.  Tracking control optimization scheme of continuous-time nonlinear system via online single network adaptive critic design method , 2017, Neurocomputing.

[224]  Jiangping Hu,et al.  Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm , 2019, Inf. Sci..

[225]  G. Saridis,et al.  Suboptimal control for nonlinear stochastic systems , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[226]  Ke Wang,et al.  Aperiodic adaptive control for neural-network-based nonzero-sum differential games: A novel event-triggering strategy. , 2019, ISA transactions.

[227]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[228]  Stephen Shields A review of fault detection methods for large systems , 1976 .

[229]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.