Safety-Critical Online Control with Adversarial Disturbances

This paper studies the control of safety-critical dynamical systems in the presence of adversarial disturbances. We seek to synthesize state-feedback controllers to minimize a cost incurred due to the disturbance while respecting the safety constraint. The safety constraint is given by a bound on an ${{\mathcal{H}}_\infty }$ norm, while the cost is specified as an upper bound on an ${{\mathcal{H}}_2}$ norm of the system. We consider an online setting where costs at each time are revealed only after the controller at that time is chosen. We propose an iterative approach to the synthesis of the controller by solving a modified discrete-time Riccati equation. Solutions of this equation enforce the safety constraint. We compare the cost of this controller with that of the optimal controller when one has complete knowledge of disturbances and costs in hindsight. We show that the regret function, which is defined as the difference between these costs, called varies logarithmically with the time horizon. We validate our approach on a process control setup that is subject to two kinds of adversarial attacks.

[1]  Alberto Bemporad,et al.  Robust model predictive control: A survey , 1998, Robustness in Identification and Control.

[2]  Isaac Kaminer,et al.  Mixed H2/H∞ control for discrete-time systems via convex optimization , 1992, American Control Conference.

[3]  Roy M. Howard,et al.  Linear System Theory , 1992 .

[4]  G. Basile,et al.  Controlled and conditioned invariants in linear system theory , 1992 .

[5]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[6]  Sham M. Kakade,et al.  Online Control with Adversarial Disturbances , 2019, ICML.

[7]  Karan Singh,et al.  Logarithmic Regret for Online Control , 2019, NeurIPS.

[8]  D. Bernstein,et al.  Mixed-norm H2/H∞ regulation and estimation: the discrete-time case , 1991, 1991 American Control Conference.

[9]  Ayan Banerjee,et al.  Ensuring Safety, Security, and Sustainability of Mission-Critical Cyber–Physical Systems , 2012, Proceedings of the IEEE.

[10]  Julia E. Sullivan,et al.  How cyber-attacks in Ukraine show the vulnerability of the U.S. power grid , 2017 .

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Benjamin Recht,et al.  The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.

[13]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[14]  Mi-Ching Tsai,et al.  Robust and Optimal Control , 2014 .

[15]  Jan Willems,et al.  Almost invariant subspaces: An approach to high gain feedback design , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[16]  William S. Levine,et al.  Handbook of Model Predictive Control , 2018, Control Engineering.

[17]  N. Lawrence Ricker,et al.  Model predictive control of a continuous, nonlinear, two-phase reactor , 1993 .

[18]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[19]  Anca D. Dragan,et al.  Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.

[20]  Yan Xu,et al.  Data-Driven Load Frequency Control for Stochastic Power Systems: A Deep Reinforcement Learning Method With Continuous Action Search , 2019, IEEE Transactions on Power Systems.

[21]  Peyman Mohajerin Esfahani,et al.  Learning Robust Controllers for Linear Quadratic Systems with Multiplicative Noise via Policy Gradient , 2019 .

[22]  Sham M. Kakade,et al.  The Nonstochastic Control Problem , 2020, ALT.

[23]  George J. Pappas,et al.  Sample Complexity of Kalman Filtering for Unknown Systems , 2019, L4DC.

[24]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  Nikolai Matni,et al.  Safely Learning to Control the Constrained Linear Quadratic Regulator , 2018, 2019 American Control Conference (ACC).

[27]  Katsuhisa Furuta,et al.  Closed-form solutions to discrete-time LQ optimal control and disturbance attenuation , 1993 .

[28]  Ali Saberi,et al.  H2 and H∞ almost disturbance decoupling problem with internal stability , 1996 .

[29]  Tyler Summers,et al.  Learning robust control for LQR systems with multiplicative noise via policy gradient , 2019, ArXiv.

[30]  M.D.S. Aliyu,et al.  Discrete-time mixed H2/H∞ nonlinear filtering , 2008, 2008 American Control Conference.

[31]  Tamer Basar,et al.  Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence , 2020, L4DC.

[32]  Michail Maniatakos,et al.  Machine learning-based defense against process-aware attacks on Industrial Control Systems , 2016, 2016 IEEE International Test Conference (ITC).

[33]  David J. N. Limebeer,et al.  Linear Robust Control , 1994 .

[34]  Yuanqing Xia,et al.  Fault Diagnosis of Tennessee-Eastman Process Using Orthogonal Incremental Extreme Learning Machine Based on Driving Amount , 2018, IEEE Transactions on Cybernetics.

[35]  J. Willems Almost invariant subspaces: An approach to high gain feedback design--Part II: Almost conditionally invariant subspaces , 1981 .

[36]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[37]  Krist V. Gernaey,et al.  A novel use for an old problem: The Tennessee Eastman challenge process as an activating teaching tool , 2020 .

[38]  Max Simchowitz,et al.  Logarithmic Regret for Adversarial Online Control , 2020, ICML.

[39]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[40]  G. Hewer An iterative technique for the computation of the steady state gains for the discrete optimal regulator , 1971 .

[41]  Quanyan Zhu,et al.  A Dynamic Games Approach to Proactive Defense Strategies against Advanced Persistent Threats in Cyber-Physical Systems , 2019, Comput. Secur..

[42]  D. Bernstein,et al.  Mixed-norm H 2 /H ∞ regulation and estimation: the discrete-time case , 1991 .

[43]  Zongli Lin,et al.  Solutions to General H, Almost Disturbance Decoupling Problem with Measurement Feedback and Internal Stability , 1999 .

[44]  J. Doyle,et al.  Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.

[45]  Avinatan Hassidim,et al.  Online Linear Quadratic Control , 2018, ICML.

[46]  J. Pearson Linear multivariable control, a geometric approach , 1977 .

[47]  Canada.,et al.  An Iterative Riccati Algorithm for Online Linear Quadratic Control. , 2019, 1912.09451.

[48]  Dimitar Filev,et al.  Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning , 2019, Robotics Auton. Syst..

[49]  George J. Pappas,et al.  Online Learning of the Kalman Filter With Logarithmic Regret , 2020, IEEE Transactions on Automatic Control.

[50]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[51]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[52]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..