Training a robust reinforcement learning controller for the uncertain system based on policy gradient method

Abstract The target of this paper is to design a model-free robust controller for uncertain systems. The uncertainties of the control system mainly consists of model uncertainty and external disturbance, which widely exist in the practical utilization. These uncertainties will negatively influence the system performance and this motivates us to train a model-free controller to solve this problem. Reinforcement learning is an important branch of machine learning and is able to achieve well performed control results by optimizing a policy without the knowledge of mathematical plant model. In this paper, we construct a reward function module to describe the specific environment of the concerned system, taking uncertainties into account. Then we utilize a new policy gradient method to optimize the policy and implement this algorithm with the actor-critic structure neuro networks. These two networks are our reinforcement learning controllers. Finally, we illustrate the applicability and efficiency of the proposed method by applying it on an experimental helicopter platform model, which includes model uncertainties and external disturbances.

[1]  Zhan Li,et al.  Valid data based normalized cross-correlation (VDNCC) for topography identification , 2018, Neurocomputing.

[2]  Frank L. Lewis,et al.  H∞ control of linear discrete-time systems: Off-policy reinforcement learning , 2017, Autom..

[3]  Huijun Gao,et al.  Robust Second-Order Consensus Tracking of Multiple 3-DOF Laboratory Helicopters via Output Feedback , 2015, IEEE/ASME Transactions on Mechatronics.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Javad Mohammadpour,et al.  A Robust MPC for Input-Output LPV Models , 2016, IEEE Transactions on Automatic Control.

[6]  Shen Yin,et al.  Robust Global Identification and Output Estimation for LPV Dual-Rate Systems Subjected to Random Output Time-Delays , 2017, IEEE Transactions on Industrial Informatics.

[7]  Huai-Ning Wu,et al.  Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control , 2017, IEEE Transactions on Cybernetics.

[8]  Zhongke Shi,et al.  Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Peng Shi,et al.  Distributed command filtered backstepping consensus tracking control of nonlinear multiple-agent systems in strict-feedback form , 2015, Autom..

[10]  Daniel Coutinho,et al.  Multiple-Loop H-Infinity Control Design for Uninterruptible Power Supplies , 2007, IEEE Transactions on Industrial Electronics.

[11]  Jian Chen,et al.  A continuous asymptotic tracking control strategy for uncertain nonlinear systems , 2004, IEEE Transactions on Automatic Control.

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[15]  Frank L. Lewis,et al.  Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes , 2018, IEEE Transactions on Industrial Electronics.

[16]  Frank L. Lewis,et al.  Tracking Control for Linear Discrete-Time Networked Control Systems With Unknown Dynamics and Dropout , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[18]  Huijun Gao,et al.  Robust output-feedback attitude control of a three-degree-of-freedom helicopter via sliding-mode observation technique , 2015 .

[19]  Frank L. Lewis,et al.  Data-Driven Flotation Industrial Process Operational Optimal Control Based on Reinforcement Learning , 2018, IEEE Transactions on Industrial Informatics.

[20]  Warren E. Dixon,et al.  Model-based reinforcement learning for infinite-horizon approximate optimal tracking , 2014, 53rd IEEE Conference on Decision and Control.

[21]  Bo Zhu,et al.  Synchronised trajectory tracking for a network of MIMO non-minimum phase systems with application to aircraft control , 2018 .

[22]  Peter Corcoran,et al.  Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning , 2017, ArXiv.

[23]  Tingwen Huang,et al.  Off-Policy Reinforcement Learning for $ H_\infty $ Control Design , 2013, IEEE Transactions on Cybernetics.

[24]  Frank L. Lewis,et al.  Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Wenjie Lu,et al.  A Hybrid-Adaptive Dynamic Programming Approach for the Model-Free Control of Nonlinear Switched Systems , 2016, IEEE Transactions on Automatic Control.

[26]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[27]  Huaguang Zhang,et al.  Adaptive Dynamic Programming for a Class of Complex-Valued Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Zhan Li,et al.  Robust distributed attitude synchronization of multiple three-DOF experimental helicopters , 2015 .

[29]  Huaguang Zhang,et al.  Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method , 2017, IEEE Transactions on Industrial Electronics.

[30]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[31]  Huijun Gao,et al.  Nonlinear Robust Attitude Tracking Control of a Table-Mount Experimental Helicopter Using Output Feedback , 2015, IEEE Transactions on Industrial Electronics.

[32]  Frank L. Lewis,et al.  Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Zhan Li,et al.  Decentralized output-feedback formation control of multiple 3-DOF laboratory helicopters , 2015, J. Frankl. Inst..

[35]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Zhong-Ping Jiang,et al.  Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems , 2016, IEEE Transactions on Automatic Control.

[37]  G. Hu,et al.  Robust consensus tracking of double‐integrator dynamics by bounded distributed control , 2016 .

[38]  Carlos Silvestre,et al.  A Synthesis Method of LTI MIMO Robust Controllers for Uncertain LPV Plants , 2014, IEEE Transactions on Automatic Control.