Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme

A reinforcement learning (RL) based adaptive dynamic programming (ADP) is developed to learn the approximate optimal stabilization input of the servo mechanisms, where the unknown system dynamics are approximated with a three-layer neural network (NN) identifier. First, the servo mechanism model is constructed and a three-layer NN identifier is used to approximate the unknown servo system. The NN weights of both the hidden layer and output layer are synchronously tuned with an adaptive gradient law. An RL-based critic three-layer NN is then used to learn the optimal cost function, where NN weights of the first layer are set as constants, NN weights of the second layer are updated by minimizing the squared Hamilton-Jacobi-Bellman (HJB) error. The optimal stabilization input of the servomechanism is obtained based on the three-layer NN identifier and RL-based critic NN scheme, which can stabilize the motor speed from its initial value to the given value. Moreover, the convergence analysis of the identifier and RL-based critic NN is proved, the stability of the cost function with the proposed optimal input is analyzed. Finally, a servo mechanism model and a complex system are provided to verify the correctness of the proposed methods.

[1]  Xuemei Ren,et al.  Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With Reinforcement Learning , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Qinglai Wei,et al.  Adaptive Dynamic Programming-Based Optimal Control Scheme for Energy Storage Systems With Solar Renewable Energy , 2017, IEEE Transactions on Industrial Electronics.

[3]  Wei Zhao,et al.  Eso‐Based Adaptive Robust Control of Dual Motor Driving Servo System , 2016 .

[4]  Qichao Zhang,et al.  Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics , 2016, IEEE Transactions on Cybernetics.

[5]  Jong Min Lee,et al.  Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions , 2004 .

[6]  Jing Na,et al.  Online optimal solutions for multi-player nonzero-sum game with completely unknown dynamics , 2017, Neurocomputing.

[7]  Yuxin Su,et al.  Automatic disturbances rejection controller for precise motion control of permanent-magnet synchronous motors , 2005, IEEE Transactions on Industrial Electronics.

[8]  Yingmin Jia,et al.  Robust control with decoupling performance for steering and traction of 4WS vehicles under velocity-varying motion , 2000, IEEE Trans. Control. Syst. Technol..

[9]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[10]  Bo Zhao,et al.  Model-free Adaptive Dynamic Programming Based Near-optimal Decentralized Tracking Control of Reconfigurable Manipulators , 2018 .

[11]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[12]  Jing Na,et al.  Online Adaptive Parameter Estimation for Quadrotors , 2018, Algorithms.

[13]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[14]  Jing Na,et al.  Online H∞ control for completely unknown nonlinear systems via an identifier–critic-based ADP structure , 2019, Int. J. Control.

[15]  Ding Wang,et al.  Adaptive-Critic-Based Robust Trajectory Tracking of Uncertain Dynamics and Its Application to a Spring–Mass–Damper System , 2018, IEEE Transactions on Industrial Electronics.

[16]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[17]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[18]  Myung-Joong Youn,et al.  Robust nonlinear speed control of PM synchronous motor using boundary layer integral sliding mode control technique , 2000, IEEE Trans. Control. Syst. Technol..

[19]  Haibo He,et al.  Adaptive Dynamic Programming for Robust Regulation and Its Application to Power Systems , 2018, IEEE Transactions on Industrial Electronics.

[20]  Derong Liu,et al.  Online Synchronous Approximate Optimal Learning Algorithm for Multi-Player Non-Zero-Sum Games With Unknown Dynamics , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[21]  Dirk C. Mattfeld,et al.  Offline-Online Approximate Dynamic Programming for Dynamic Vehicle Routing with Stochastic Requests , 2019, Transp. Sci..

[22]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[23]  Ali Karimpour,et al.  Approximate dynamic programming for two-player zero-sum game related to H∞ control of unknown nonlinear continuous-time systems , 2014, International Journal of Control, Automation and Systems.

[24]  Yingmin Jia,et al.  Alternative proofs for improved LMI representations for the analysis and the design of continuous-time systems with polytopic type uncertainty: a predictive approach , 2003, IEEE Trans. Autom. Control..

[25]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[26]  K.J. Tseng,et al.  Nonlinear control of interior permanent magnet synchronous motor , 2000, Conference Record of the 2000 IEEE Industry Applications Conference. Thirty-Fifth IAS Annual Meeting and World Conference on Industrial Applications of Electrical Energy (Cat. No.00CH37129).

[27]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[28]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[29]  Frank L. Lewis,et al.  Mixed Iterative Adaptive Dynamic Programming for Optimal Battery Energy Control in Smart Residential Microgrids , 2017, IEEE Transactions on Industrial Electronics.

[30]  Tom Van Woensel,et al.  Special Issue on Recent Advances in Urban Transport and Logistics Through Optimization and Analytics , 2019, Transp. Sci..

[31]  Guido Herrmann,et al.  Robust adaptive finite‐time parameter estimation and control for robotic systems , 2015 .

[32]  Haibo He,et al.  Intelligent Critic Control With Disturbance Attenuation for Affine Dynamics Including an Application to a Microgrid System , 2017, IEEE Transactions on Industrial Electronics.

[33]  Jing Na,et al.  RISE-Based Asymptotic Prescribed Performance Tracking Control of Nonlinear Servo Mechanisms , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[34]  Huaguang Zhang,et al.  Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input , 2019, Neurocomputing.

[35]  Yu Guo,et al.  Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics , 2016, Int. J. Control.

[36]  Yu Guo,et al.  Adaptive Prescribed Performance Motion Control of Servo Mechanisms with Friction Compensation , 2014, IEEE Transactions on Industrial Electronics.

[37]  Tingwen Huang,et al.  Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design , 2014, Autom..

[38]  J. Na,et al.  Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems , 2014, IEEE/CAA Journal of Automatica Sinica.