Continuous-Time Safe Learning with Temporal Logic Constraints in Adversarial Environments

This paper investigates a safe learning problem that satisfies linear temporal logic (LTL) constraints with persistent adversarial inputs, and quantified performance and robustness. Via a finite state automaton, the LTL specification is first decomposed to a sequence of several two point boundary value problems (TPBVP), each of which has an invariant safety zone. Then we employ a system transformation that guarantees state, and control safety with logarithmic barrier and hyperbolic-type functions as well as a worst-case adversarial input that wants to push the system outside the safety set. A safe learning method is used to solve the sub-problem, where the actors (approximators of the optimal control and the worst- case adversarial inputs) and the critic (approximator of the cost) are tuned to learn the optimal policies without violating any safety. Finally, by following a Lyapunov stability analysis we prove boundedness of the closed-loop system while simulation results are used to validate the effectiveness.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[3]  Frank L. Lewis,et al.  Autonomy and machine intelligence in complex systems: A tutorial , 2015, 2015 American Control Conference (ACC).

[4]  Paulo Tabuada,et al.  Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.

[5]  Huei Peng,et al.  Enhancing the Performance of a Safe Controller Via Supervised Learning for Truck Lateral Control , 2017, Journal of Dynamic Systems, Measurement, and Control.

[6]  Francis Eng Hock Tay,et al.  Barrier Lyapunov Functions for the control of output-constrained nonlinear systems , 2009, Autom..

[7]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[8]  Stephen Prajna Barrier certificates for nonlinear model validation , 2006, Autom..

[9]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[10]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[11]  Kyriakos G. Vamvoudakis,et al.  Asymptotically Stable Adaptive–Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[13]  Li Wang,et al.  Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation , 2018, ArXiv.

[14]  Keng Peng Tee,et al.  Adaptive Neural Control for Output Feedback Nonlinear Systems Using a Barrier Lyapunov Function , 2010, IEEE Transactions on Neural Networks.

[15]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[17]  Warren E. Dixon,et al.  Reinforcement Learning for Optimal Feedback Control , 2018 .

[18]  J. Primbs,et al.  Constrained nonlinear optimal control: a converse HJB approach , 1996 .

[19]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[20]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[21]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.