Safety-Aware Reinforcement Learning Framework with an Actor-Critic-Barrier Structure

This paper considers the control problem with constraints on full-state and control input simultaneously. First, a novel barrier function based system transformation approach is developed to guarantee the full-state constraints. To deal with the input saturation, the hyperbolic-type penalty function is imposed on the control input. The actor-critic based reinforcement learning technique is combined with the barrier transformation to learn the optimal control policy that considers both the full-state constraints and input saturations. To illustrate the efficacy, a numeric simulation is implemented in the end.

[1]  Kyriakos G. Vamvoudakis,et al.  Asymptotically Stable Adaptive–Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Frank L. Lewis,et al.  Nearly optimal state feedback control of constrained nonlinear systems using a neural networks HJB approach , 2004, Annu. Rev. Control..

[3]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[4]  Marios M. Polycarpou,et al.  A Robust Adaptive Nonlinear Control Design , 1993, 1993 American Control Conference.

[5]  J. Primbs,et al.  Constrained nonlinear optimal control: a converse HJB approach , 1996 .

[6]  Shuzhi Sam Ge,et al.  Adaptive tracking control of uncertain MIMO nonlinear systems with input constraints , 2011, Autom..

[7]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[8]  W. Dixon Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles , 2014 .

[9]  Yixin Yin,et al.  Optimal Containment Control of Unknown Heterogeneous Systems With Active Leaders , 2019, IEEE Transactions on Control Systems Technology.

[10]  Frank L. Lewis,et al.  Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online , 2017, IEEE Control Systems.

[11]  Kyriakos G. Vamvoudakis,et al.  Dynamic intermittent Q ‐learning–based model‐free suboptimal co‐design of ‐stabilization , 2019, International Journal of Robust and Nonlinear Control.

[12]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Yixin Yin,et al.  Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Yixin Yin,et al.  Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems via Off-Policy Reinforcement Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Wei He,et al.  Adaptive Neural Network Control of an Uncertain Robot With Full-State Constraints , 2016, IEEE Transactions on Cybernetics.

[16]  Yixin Yin,et al.  Leader–Follower Output Synchronization of Linear Heterogeneous Systems With Active Leader Using Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[17]  C. L. Philip Chen,et al.  A survey of human-centered intelligent robots: issues and challenges , 2017, IEEE/CAA Journal of Automatica Sinica.

[18]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[19]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..