Receding-Horizon Actor-Critic Design for Learning-Based Control of Nonlinear Continuous-time Systems

Adaptive dynamic programming (ADP) has been recently studied to solve infinite-horizon optimal control problems of nonlinear continuoustime (CT) systems. In this paper, a receding-horizon actor-critic design (RH-ACD) method is proposed to solve the optimal control problem of nonlinear CT systems. In the proposed RH-ACD method, the recedinghorizon control strategy, which is originated from the idea of model predictive control (MPC). The actorcritic structure is designed to approximate the timedependent control policy and value function in each prediction horizon. The network weights of the actor and the critic are updated simultaneously online. The simulation results show that RH-ACD has improved control performance and reduced computational costs when compared with conventional MPC and infinitehorizon ADP.

[1]  Jun Zhao,et al.  Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties , 2020, Neurocomputing.

[2]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[3]  Daniel Liberzon,et al.  Calculus of Variations and Optimal Control Theory: A Concise Introduction , 2012 .

[4]  H. ChenT,et al.  A Quasi-Infinite Horizon Nonlinear Model Predictive Control Scheme with Guaranteed Stability * , 1998 .

[5]  Frank L. Lewis,et al.  Adaptive optimal control algorithm for continuous-time nonlinear systems based on policy iteration , 2008, 2008 47th IEEE Conference on Decision and Control.

[6]  Haibo He,et al.  Near-Optimal Tracking Control of Mobile Robots Via Receding-Horizon Dual Heuristic Programming , 2016, IEEE Transactions on Cybernetics.

[7]  Derong Liu,et al.  Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints , 2015, IEEE Transactions on Cybernetics.

[8]  Haibo He,et al.  Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming , 2019, IEEE Transactions on Cybernetics.

[9]  Chuanqiang Lian,et al.  Learning-Based Predictive Control for Discrete-Time Nonlinear Systems With Stochastic Disturbances , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[11]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[12]  Haibo He,et al.  Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming , 2017, IEEE Transactions on Industrial Electronics.