Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation

In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals. We show that a small falsification of the cost parameters will only lead to a bounded change in the optimal policy. The bound is linear on the amount of falsification the attacker can apply to the cost parameters. We propose an attack model where the attacker aims to mislead the agent into learning a `nefarious' policy by intentionally falsifying the cost parameters. We formulate the attack's problem as a convex optimization problem and develop necessary and sufficient conditions to check the achievability of the attacker's goal. We showcase the adversarial manipulation on two types of LQG learners: the batch RL learner and the other is the adaptive dynamic programming (ADP) learner. Our results demonstrate that with only 2.296% of falsification on the cost data, the attacker misleads the batch RL into learning the 'nefarious' policy that leads the vehicle to a dangerous position. The attacker can also gradually trick the ADP learner into learning the same `nefarious' policy by consistently feeding the learner a falsified cost signal that stays close to the actual cost signal. The paper aims to raise people's awareness of the security threats faced by RL-enabled control systems.

[1]  L. Lai,et al.  Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning , 2021, NeurIPS.

[2]  Quanyan Zhu,et al.  Manipulating Reinforcement Learning: Stealthy Attacks on Cost Signals , 2021, Game Theory and Machine Learning for Cyber Security.

[3]  Quanyan Zhu,et al.  Reinforcement Learning for Feedback-Enabled Cyber Resilience , 2021, Annu. Rev. Control..

[4]  Krishna Chaitanya Kosaraju,et al.  Adversarial attacks in consensus-based multi-agent reinforcement learning , 2021, 2021 American Control Conference (ACC).

[5]  Quanyan Zhu,et al.  Cross-Layer Coordinated Attacks on Cyber-Physical Systems: A LQG Game Framework with Controlled Observations , 2020, 2021 European Control Conference (ECC).

[6]  Xiaojin Zhu,et al.  Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning , 2020, ICML.

[7]  A. Singla,et al.  Adaptive Reward-Poisoning Attacks against Reinforcement Learning , 2020, ICML.

[8]  Cho-Jui Hsieh,et al.  Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations , 2020, NeurIPS.

[9]  Xiaojin Zhu,et al.  Policy Poisoning in Batch Reinforcement Learning and Control , 2019, NeurIPS.

[10]  Quanyan Zhu,et al.  Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals , 2019, GameSec.

[11]  Fabio Pasqualetti,et al.  Secure trajectory planning against undetectable spoofing attacks , 2019, Autom..

[12]  Benjamin Recht,et al.  Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[13]  Tomohisa Hayakawa,et al.  An Overview on Denial-of-Service Attacks in Control Systems: Attack Models and Security Analyses , 2019, Entropy.

[14]  Zhong-Ping Jiang,et al.  Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems , 2019, Control Theory and Technology.

[15]  Jingkang Wang,et al.  Reinforcement Learning with Perturbed Rewards , 2018, AAAI.

[16]  Arslan Munir,et al.  Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles , 2018, IEEE Intelligent Transportation Systems Magazine.

[17]  Quanyan Zhu,et al.  Strategic Defense Against Deceptive Civilian GPS Spoofing of Unmanned Aerial Vehicles , 2017, GameSec.

[18]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[19]  Quanyan Zhu,et al.  Coding Schemes for Securing Cyber-Physical Systems Against Stealthy Data Injection Attacks , 2016, IEEE Transactions on Control of Network Systems.

[20]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[21]  Frank L. Lewis,et al.  Optimized Assistive Human–Robot Interaction Using Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[22]  Jongeun Choi,et al.  Solutions to the Inverse LQR Problem With Application to Biological Systems Analysis , 2015, IEEE Transactions on Control Systems Technology.

[23]  David Hsu,et al.  Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach , 2013, TOMM.

[24]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[25]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[26]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[27]  Yutaka Yamamoto,et al.  Solution to the inverse regulator problem for discrete-time systems , 1988 .

[28]  D. Bertsekas Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[29]  Takao Fujii,et al.  A complete optimally condition in the inverse problem of optimal control , 1984, The 23rd IEEE Conference on Decision and Control.

[30]  D. Sworder,et al.  Introduction to stochastic control , 1972 .

[31]  D. Kleinman On an iterative technique for Riccati equation computations , 1968 .

[32]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[33]  Zinovi Rabinovich,et al.  Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning , 2021, AAMAS.

[34]  Szu-Hao Huang,et al.  Adversarial Attacks Against Reinforcement Learning-Based Portfolio Management Strategy , 2021, IEEE Access.

[35]  Xiaojin Zhu,et al.  Online Data Poisoning Attacks , 2020, L4DC.

[36]  Daniel Gorges,et al.  Distributed Adaptive Linear Quadratic Control using Distributed Reinforcement Learning , 2019, IFAC-PapersOnLine.

[37]  Yilin Mo,et al.  False Data Injection Attacks in Control Systems , 2010 .

[38]  Antonio T. Alexandridis,et al.  New results to the inverse optimal control problem for discrete-time linear systems , 1998 .