Neural H2 Control Using Reinforcement Learning for Unknown Nonlinear Systems

In this paper we discuss discrete-time ${\mathcal{H}_2}$ control for unknown nonlinear system. We use recurrent neural networks to model the system identification, then apply ${\mathcal{H}_2}$ tracking control. The neural networks based critic control does not require the system dynamics. Our optimal control policy uses a recursive solution of the discrete algebraic Riccati equation and reinforcement learning. The stabilities of system identification and ${\mathcal{H}_2}$ tracking control are proven. he convergence of the approach is also given by the use of Lyapunov stability theory. The proposed method is validated with the control of a surge tank.

[1]  Derong Liu,et al.  Adaptive Dynamic Programming for Control , 2012 .

[2]  Warren E. Dixon,et al.  Model-based reinforcement learning for approximate optimal regulation , 2016, Autom..

[3]  Xiaoou Li,et al.  Impedance Control without Environment Model by Reinforcement Learning , 2019, 2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP).

[4]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Wen Yu,et al.  Position/force control of robot manipulators using reinforcement learning , 2019, Ind. Robot.

[6]  Frank L. Lewis,et al.  Online learning algorithm for zero-sum games with integral reinforcement learning , 2011 .

[7]  Alexander S. Poznyak,et al.  Indirect adaptive control via parallel dynamic neural networks , 1999 .

[8]  Robert Babuska,et al.  Efficient Model Learning Methods for Actor–Critic Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Wen Yu,et al.  Multiple recurrent neural networks for stable adaptive control , 2006, Neurocomputing.

[10]  Adolfo Perrusquía,et al.  Robust control under worst‐case uncertainty for unknown nonlinear systems using modified reinforcement learning , 2020, International Journal of Robust and Nonlinear Control.

[11]  Frank L. Lewis,et al.  Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2012 .

[12]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Robert Babuska,et al.  Model learning actor-critic algorithms: Performance evaluation in a motion control task , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[14]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Wen Yu,et al.  Nonlinear system identification using discrete-time recurrent neural networks with stable learning algorithms , 2004, Inf. Sci..

[16]  Avimanyu Sahoo,et al.  Near Optimal Event-Triggered Control of Nonlinear Discrete-Time Systems Using Neurodynamic Programming , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[18]  Frank L. Lewis,et al.  Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[19]  Insoo Lee,et al.  Neural network indirect adaptive control with fast learning algorithm , 1996, Neurocomputing.

[20]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[21]  Wen Yu,et al.  Large space dimension Reinforcement Learning for Robot Position/Force Discrete Control , 2019, 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT).

[22]  Robert Babuska,et al.  Actor-Critic Control with Reference Model Learning , 2011, IFAC Proceedings Volumes.

[23]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[25]  Victor M. Becerra,et al.  Optimal control , 2008, Scholarpedia.

[26]  Hazem N. Nounou,et al.  Stable auto-tuning of adaptive fuzzy/neural controllers for nonlinear discrete-time systems , 2004, IEEE Transactions on Fuzzy Systems.

[27]  Xiaoou Li,et al.  Recurrent fuzzy neural networks for nonlinear system identification , 2007, 2007 IEEE 22nd International Symposium on Intelligent Control.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Zhong-Ping Jiang,et al.  Input-to-state stability for discrete-time nonlinear systems , 1999 .

[30]  Xiaoou Li,et al.  Discrete-time nonlinear system identification using recurrent neural networks , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[31]  Fuxiao Tan,et al.  Discrete-time LQR optimal tracking control problems using Approximate Dynamic Programming algorithm with disturbance , 2013, 2013 Fourth International Conference on Intelligent Control and Information Processing (ICICIP).

[32]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[33]  Shuzhi Sam Ge,et al.  Optimal Critic Learning for Robot Control in Time-Varying Environments , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Haibo He,et al.  Adaptive Event-Triggered Control Based on Heuristic Dynamic Programming for Nonlinear Discrete-Time Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.