Regret Bounds for LQ Adaptive Control Under Database Attacks (Extended Version)

This paper is concerned with understanding and countering the effects of database attacks on a learning-based linear quadratic adaptive controller. This attack targets neither sensors nor actuators, but just poisons the learning algorithm and parameter estimator that is part of the regulation scheme. We focus on the adaptive optimal control algorithm introduced by Abbasi-Yadkori and Szepesvari and provide regret analysis in the presence of attacks as well as modifications that mitigate their effects. A core step of this algorithm is the self-regularized on-line least squares estimation, which determines a tight confidence set around the true parameters of the system with high probability. In the absence of malicious data injection, this set provides an appropriate estimate of parameters for the aim of control design. However, in the presence of attack, this confidence set is not reliable anymore. Hence, we first tackle the question of how to adjust the confidence set so that it can compensate for the effect of the poisonous data. Then, we quantify the deleterious effect of this type of attack on the optimality of control policy by providing a measure that we call attack regret.

[1]  Kyriakos G. Vamvoudakis,et al.  A Moving Target Defense Control Framework for Cyber-Physical Systems , 2020, IEEE Transactions on Automatic Control.

[2]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[3]  Csaba Szepesvari,et al.  Online learning for linearly parametrized control problems , 2012 .

[4]  Frank L. Lewis,et al.  Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online , 2017, IEEE Control Systems.

[5]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[6]  Ali Davoudi,et al.  Resilient and Robust Synchronization of Multiagent Systems Under Attacks on Sensors and Actuators , 2020, IEEE Transactions on Cybernetics.

[7]  Mohamad Kazem Shirani Faradonbeh,et al.  Regret Analysis for Adaptive Linear-Quadratic Policies , 2017 .

[8]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[9]  Yishay Mansour,et al.  Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[10]  Yilin Mo,et al.  False Data Injection Attacks in Control Systems , 2010 .

[11]  Insup Lee,et al.  Cyber-physical systems: The next computing revolution , 2010, Design Automation Conference.

[12]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[13]  Yixin Yin,et al.  Data-Driven Integral Reinforcement Learning for Continuous-Time Non-Zero-Sum Games , 2019, IEEE Access.

[14]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[15]  Adel Javanmard,et al.  Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.

[16]  João P. Hespanha,et al.  Cooperative Q-Learning for Rejection of Persistent Adversarial Inputs in Networked Linear Quadratic Systems , 2018, IEEE Transactions on Automatic Control.

[17]  S. Shankar Sastry,et al.  Secure Control: Towards Survivable Cyber-Physical Systems , 2008, 2008 The 28th International Conference on Distributed Computing Systems Workshops.

[18]  Csaba Szepesvári,et al.  Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.