Implementing SCADA Scenarios and Introducing Attacks to Obtain Training Data for Intrusion Detection Methods

There are hardly any data sets publicly available that can be used to evaluate intrusion detection algorithms. The biggest threat for industrial applications arises from state-sponsored and criminal groups. Often, formerly unknown exploits are employed by these attackers, so-called 0-day exploits. They cannot be discovered with signature-based intrusion detection. Thus, statistical or machine learning based anomaly detection lends itself readily. These methods especially, however, need a large amount of labelled training data. In this work, an exemplary industrial use case with real-world industrial hardware is presented. Siemens S7 Programmable Logic Controllers are used to control a real world-based control application using the OPC UA protocol: A pump, filling and emptying water tanks. This scenario is used to generate application specific network data. Furthermore, attacks are introduced into this data set. This is done in three ways: First, the normal process is monitored and captured. Common attacks are then synthetically introduced into this data set. Second, malicious behaviour is implemented on the Programmable Logic Controller program and executed live, the traffic is captured as well. Third, malicious behaviour is implemented on the Programmable Logic Controller while still keeping the same output behaviour as in normal operation. An attacker could exploit an application but forge valid sensor output so that no anomaly is detected. Sensors are employed, capturing temperature, sound and flow of water to create data that can be correlated to the network data and used to still detect the attack. All data is labelled, containing the ground truth, meaning all attacks are known and no unknown attacks occur. This makes them perfect for training of anomaly detection algorithms. The data is published to enable security researchers to evaluate intrusion detection solutions.

[1]  Vinay M. Igure,et al.  Security issues in SCADA networks , 2006, Comput. Secur..

[2]  Max Mühlhäuser,et al.  Towards the creation of synthetic, yet realistic, intrusion detection datasets , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[3]  José M. Fernandez,et al.  Providing SCADA Network Data Sets for Intrusion Detection Research , 2016, CSET @ USENIX Security Symposium.

[4]  Wei Gao,et al.  Industrial Control System Traffic Data Sets for Intrusion Detection Research , 2014, Critical Infrastructure Protection.

[5]  Wei Gao,et al.  Industrial Control System Cyber Attacks , 2013, ICS-CSR.

[6]  Andreas Hotho,et al.  Flow-based benchmark data sets for intrusion detection , 2017 .

[7]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[8]  Béla Genge,et al.  A cyber-physical experimentation environment for the security analysis of networked industrial control systems , 2012, Comput. Electr. Eng..

[9]  José M. Fernandez,et al.  An isolated virtual cluster for SCADA network security research , 2013, ICS-CSR.

[10]  Agata Sawicka,et al.  A Framework for Human Factors in Information Security , 2002 .

[11]  Hans D. Schotten,et al.  The Dos and Don'ts of Industrial Network Simulation: A Field Report , 2019, ArXiv.

[12]  B. Buchanan,et al.  Attributing Cyber Attacks , 2015 .

[13]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[14]  Béla Genge,et al.  EPIC: A Testbed for Scientifically Rigorous Cyber-Physical Security Experimentation , 2013, IEEE Transactions on Emerging Topics in Computing.

[15]  Wei Gao,et al.  A control system testbed to validate critical infrastructure protection concepts , 2011, Int. J. Crit. Infrastructure Prot..

[16]  Hans D. Schotten,et al.  Two decades of SCADA exploitation: A brief history , 2017, 2017 IEEE Conference on Application, Information and Network Security (AINS).