Deep reinforecement learning based optimal defense for cyber-physical system in presence of unknown cyber-attack

In this paper, the online optimal cyber-defense problem has been investigated for Cyber-Physical Systems (CPS) with unknown cyber-attacks. Firstly, a novel cyber state dynamics has been generated that can evaluate the real-time impacts from current cyber-attack and defense strategies effectively and dynamically. Next, adopting game theory technique, the idea optimal defense design can be obtained by using the full knowledge of cyber-state dynamics. To relax the requirement about cyber-state dynamics, a game-theoretical actor-critic neural network (NN) structure was developed to efficiently learn the optimal cyber defense strategy online. Moreover, to further improve the practicality of developed scheme, a novel deep reinforcement learning algorithm have been designed and implemented into actor-critic NN structure. Eventually, the numerical simulation demonstrate that proposed deep reinforcement learning based optimal defense strategy cannot only online defend the CPS even in presence of unknown cyber-attacks, and also learn the optimal defense policy more accurate and timely.

[1]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Wenyuan Xu,et al.  The feasibility of launching and detecting jamming attacks in wireless networks , 2005, MobiHoc '05.

[4]  Walid Saad,et al.  Resilient PHEV charging policies under price information attacks , 2012, 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm).

[5]  Francesco Bullo,et al.  Control-Theoretic Methods for Cyberphysical Security: Geometric Principles for Optimal Cross-Layer Resilient Control Systems , 2015, IEEE Control Systems.

[6]  Xin-Ping Guan,et al.  Ubiquitous Monitoring for Industrial Cyber-Physical Systems Over Relay- Assisted Wireless Sensor Networks , 2015, IEEE Transactions on Emerging Topics in Computing.

[7]  Karl Henrik Johansson,et al.  Attack models and scenarios for networked control systems , 2012, HiCoNS '12.

[8]  Frank L. Lewis,et al.  Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..

[9]  Quanyan Zhu,et al.  Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems , 2015, IEEE Control Systems.

[10]  Qichao Zhang,et al.  Event-Triggered H ∞ Control for Continuous-Time Nonlinear System , 2015, ISNN.

[11]  Derong Liu,et al.  Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[12]  H. Zimmermann,et al.  OSI Reference Model - The ISO Model of Architecture for Open Systems Interconnection , 1980, IEEE Transactions on Communications.

[13]  Randal W. Bea Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .

[14]  Sandeep K. S. Gupta,et al.  Special Issue on Cyber-Physical Systems [Scanning the Issue] , 2012, Proc. IEEE.

[15]  Fuchun Sun,et al.  Data Fusion-based resilient control system under DoS attacks: A game theoretic approach , 2015 .

[16]  Shaukat Ali,et al.  U-Test: Evolving, Modelling and Testing Realistic Uncertain Behaviours of Cyber-Physical Systems , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[17]  Nazar Abbas Saqib,et al.  Detection of jamming attacks in 802.11b wireless networks , 2013, EURASIP Journal on Wireless Communications and Networking.

[18]  Edward A. Lee Cyber Physical Systems: Design Challenges , 2008, 2008 11th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC).

[19]  Concetto Spampinato,et al.  An ontological ubiquitous city information platform provided with Cyber-Physical-Social-Systems , 2016, 2016 13th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[20]  T. Basar,et al.  A game theoretic approach to decision and analysis in network intrusion detection , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[21]  Euijin Choo,et al.  Identifying Malicious Metering Data in Advanced Metering Infrastructure , 2014, 2014 IEEE 8th International Symposium on Service Oriented System Engineering.

[22]  Jiming Chen,et al.  Distributed Collaborative Control for Industrial Automation With Wireless Sensor and Actuator Networks , 2010, IEEE Transactions on Industrial Electronics.

[23]  Draguna Vrabie,et al.  Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time framework , 2009, 2009 17th Mediterranean Conference on Control and Automation.

[24]  Ling Shi,et al.  Jamming attack on Cyber-Physical Systems: A game-theoretic approach , 2013, 2013 IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems.

[25]  Panganamala Ramana Kumar,et al.  Cyber–Physical Systems: A Perspective at the Centennial , 2012, Proceedings of the IEEE.

[26]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[27]  Frank L. Lewis,et al.  Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[28]  A. Schaft L/sub 2/-gain analysis of nonlinear systems and nonlinear state-feedback H/sub infinity / control , 1992 .

[29]  Fei Gao,et al.  The prediction role of hidden Markov model in intrusion detection , 2003, CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436).

[30]  Frank L. Lewis,et al.  Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[31]  Yuan Xue,et al.  Optimal Cross-Layer Design of Sampling Rate Adaptation and Network Scheduling for Wireless Networked Control Systems , 2012, 2012 IEEE/ACM Third International Conference on Cyber-Physical Systems.

[32]  Tamer Başar,et al.  H1-Optimal Control and Related Minimax Design Problems , 1995 .

[33]  Syed Hassan Ahmed,et al.  Cyber Physical System: Architecture, applications and research challenges , 2013, 2013 IFIP Wireless Days (WD).