Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals

This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on $Q$-learning, we show that $Q$-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the $Q$-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary's favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.

[1]  Ted K. Ralphs,et al.  Integer and Combinatorial Optimization , 2013 .

[2]  E. Kreyszig Introductory Functional Analysis With Applications , 1978 .

[3]  Quanyan Zhu,et al.  A Game-Theoretic Approach to Design Secure and Resilient Distributed Support Vector Machines , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Quanyan Zhu,et al.  iSTRICT: An Interdependent Strategic Trust Mechanism for the Cloud-Enabled Internet of Controlled Things , 2018, IEEE Transactions on Information Forensics and Security.

[6]  Qiang Ni,et al.  Application of reinforcement learning for security enhancement in cognitive radio networks , 2015, Appl. Soft Comput..

[7]  Bo Li,et al.  Reinforcement Learning with Perturbed Rewards , 2018, AAAI.

[8]  Quanyan Zhu,et al.  Deceptive Routing in Relay Networks , 2012, GameSec.

[9]  E. Cheney Analysis for Applied Mathematics , 2001 .

[10]  Chi-Kwong Li,et al.  An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control , 2005, IEEE Transactions on Intelligent Transportation Systems.

[11]  Quanyan Zhu,et al.  Game-Theoretic Approach to Feedback-Driven Multi-stage Moving Target Defense , 2013, GameSec.

[12]  Quanyan Zhu,et al.  Strategic Trust in Cloud-Enabled Cyber-Physical Systems With an Application to Glucose Control , 2017, IEEE Transactions on Information Forensics and Security.

[13]  Arslan Munir,et al.  Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles , 2018, IEEE Intelligent Transportation Systems Magazine.

[14]  Quanyan Zhu,et al.  Modeling and Analysis of Leaky Deception Using Signaling Games With Evidence , 2018, IEEE Transactions on Information Forensics and Security.

[15]  Rui Zhang,et al.  Secure and resilient distributed machine learning under adversarial environments , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[16]  Quanyan Zhu,et al.  Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems , 2015, IEEE Control Systems.

[17]  Quanyan Zhu,et al.  A Game-theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy , 2017, ACM Comput. Surv..

[18]  Dipti Srinivasan,et al.  Urban traffic signal control using reinforcement learning agents , 2010 .

[19]  Pablo H. Ibargüengoytia,et al.  Building Optimal Operation Policies for Dam Management Using Factored Markov Decision Processes , 2015, MICAI.

[20]  Branislav Bosanský,et al.  Manipulating Adversary's Belief: A Dynamic Game Approach to Deception by Design for Proactive Network Security , 2017, GameSec.

[21]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[22]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[23]  Quanyan Zhu,et al.  Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats , 2017, 2019 International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT).

[24]  Quanyan Zhu,et al.  Physical Intrusion Games—Optimizing Surveillance by Simulation and Game Theory , 2017, IEEE Access.

[25]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[26]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[27]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[28]  Arslan Munir,et al.  The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep Reinforcement Learning , 2018, ArXiv.

[29]  Tao Zhang,et al.  Game-Theoretic Analysis of Cyber Deception: Evidence-Based Strategies and Dynamic Risk Mitigation , 2019, ArXiv.

[30]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[31]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[32]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[33]  Laurent Orseau,et al.  Reinforcement Learning with a Corrupted Reward Channel , 2017, IJCAI.

[34]  Balsman,et al.  The Theorems of the Alternative , 1991 .