PPFSCADA: Privacy preserving framework for SCADA data publishing

Supervisory Control and Data Acquisition (SCADA) systems control and monitor industrial and critical infrastructure functions, such as electricity, gas, water, waste, railway, and traffic. Recent attacks on SCADA systems highlight the need for stronger SCADA security. Thus, sharing SCADA traffic data has become a vital requirement in SCADA systems to analyze security risks and develop appropriate security solutions. However, inappropriate sharing and usage of SCADA data could threaten the privacy of companies and prevent sharing of data. In this paper, we present a privacy preserving strategy-based permutation technique called PPFSCADA framework, in which data privacy, statistical properties and data mining utilities can be controlled at the same time. In particular, our proposed approach involves: (i) vertically partitioning the original data set to improve the performance of perturbation; (ii) developing a framework to deal with various types of network traffic data including numerical, categorical and hierarchical attributes; (iii) grouping the portioned sets into a number of clusters based on the proposed framework; and (iv) the perturbation process is accomplished by the alteration of the original attribute value by a new value (clusters centroid). The effectiveness of the proposed PPFSCADA framework is shown through several experiments on simulated SCADA, intrusion detection and network traffic data sets. Through experimental analysis, we show that PPFSCADA effectively deals with multivariate traffic attributes, producing compatible results as the original data, and also substantially improving the performance of the five supervised approaches and provides high level of privacy protection.

[1]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[2]  Igor Nai Fovino,et al.  A Multidimensional Critical State Analysis for Detecting Intrusions in SCADA Systems , 2011, IEEE Transactions on Industrial Informatics.

[3]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[4]  Christopher Leckie,et al.  An Efficient Clustering Scheme to Exploit Hierarchical Data in Network Traffic Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[5]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[6]  Yang Xiao,et al.  Cyber Security and Privacy Issues in Smart Grids , 2012, IEEE Communications Surveys & Tutorials.

[7]  N. Nagaveni,et al.  Evaluation of a perturbation-based technique for privacy preservation in a multi-party clustering scenario , 2013, Inf. Sci..

[8]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[9]  Jill Slay,et al.  Lessons Learned from the Maroochy Water Breach , 2007, Critical Infrastructure Protection.

[10]  Dan Wang,et al.  Privacy aware publishing of successive location information in sensor networks , 2012, Future Gener. Comput. Syst..

[11]  K. Maung,et al.  MEASUREMENT OF ASSOCIATION IN A CONTINGENCY TABLE WITH SPECIAL REFERENCE TO THE PIGMENTATION OF HAIR AND EYE COLOURS OF SCOTTISH SCHOOL CHILDREN , 1941 .

[12]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[13]  Thomas E. Potok,et al.  GPU enhanced parallel computing for large scale data clustering , 2013, Future Gener. Comput. Syst..

[14]  Neeraj Suri,et al.  Protection of SCADA Communication Channels , 2012, Critical Infrastructure Protection.

[15]  Milos Manic,et al.  Neural Network based Intrusion Detection System for critical infrastructures , 2009, 2009 International Joint Conference on Neural Networks.

[16]  Mao Lin Huang,et al.  Optimized data acquisition by time series clustering in OPC , 2011, 2011 6th IEEE Conference on Industrial Electronics and Applications.

[17]  James Harland,et al.  Pacific Asia Conference on Information Systems ( PACIS ) 7-15-2012 μ-Fractal Based Data Perturbation Algorithm For Privacy Protection , 2013 .

[18]  Cristina Alcaraz,et al.  Security Aspects of SCADA and DCS Environments , 2012, Critical Infrastructure Protection.

[19]  Yang Xiao,et al.  A survey of communication/networking in Smart Grids , 2012, Future Gener. Comput. Syst..

[20]  Jiankun Hu,et al.  Network Traffic Analysis and SCADA Security , 2010, Handbook of Information and Communication Security.

[21]  Andrew W. Moore,et al.  Architecture of a network monitor , 2003 .

[22]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[23]  Paul Fleming,et al.  Use of SCADA Data for Failure Detection in Wind Turbines , 2011 .

[24]  Chi-Ho Tsang,et al.  Multi-agent intrusion detection system in industrial network using ant colony clustering approach and unsupervised feature extraction , 2005, 2005 IEEE International Conference on Industrial Technology.

[25]  Thomas E. Potok,et al.  The GPU Enhanced Parallel Computing for Large Scale Data Clustering , 2011, 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[26]  Zahir Tari,et al.  A Framework for Improving the Accuracy of Unsupervised Intrusion Detection for SCADA Systems , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[27]  Md. Enamul Kabir,et al.  New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control , 2012, SecureComm.

[28]  Salvatore J. Stolfo,et al.  Cost-based modeling for fraud and intrusion detection: results from the JAM project , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[29]  Harald Cramér The elements of probability theory and some of its applications , 1955 .

[30]  Sujeet Shenoi,et al.  A Taxonomy of Attacks on the DNP3 Protocol , 2009, Critical Infrastructure Protection.

[31]  Igor Nai Fovino,et al.  Security Assessment Of A Turbo-Gas Power Plant , 2008, Critical Infrastructure Protection.

[32]  Robert C. Green,et al.  Intrusion Detection System in A Multi-Layer Network Architecture of Smart Grids by Yichi , 2015 .

[33]  Alfonso Valdes,et al.  Communication pattern anomaly detection in process control systems , 2009, 2009 IEEE Conference on Technologies for Homeland Security.

[34]  George Kesidis,et al.  Efficient Mining of the Multidimensional Traffic Cluster Hierarchy for Digesting, Visualization, and Anomaly Identification , 2006, IEEE Journal on Selected Areas in Communications.

[35]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[38]  Ken Munro SCADA: SCADA - A critical situation , 2008 .

[39]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[40]  Ernesto Damiani,et al.  Composite Intrusion Detection in Process Control Networks , 2008 .

[41]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[42]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[43]  Martin Naedele,et al.  Security for Process Control Systems: An Overview , 2008, IEEE Security & Privacy Magazine.

[44]  D. Newman,et al.  THE DISTRIBUTION OF RANGE IN SAMPLES FROM A NORMAL POPULATION, EXPRESSED IN TERMS OF AN INDEPENDENT ESTIMATE OF STANDARD DEVIATION , 1939 .

[45]  T. Thomas Al Qaeda and the Internet: The Danger of “Cyberplanning” , 2003, Parameters.

[46]  Georgina Stegmayer,et al.  A Biologically Inspired Validity Measure for Comparison of Clustering Methods over Metabolic Data Sets , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Panos Kalnis,et al.  Anonymous Publication of Sensitive Transactional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[48]  Zahir Tari,et al.  Toward an efficient and scalable feature selection approach for internet traffic classification , 2013, Comput. Networks.

[49]  Tharam S. Dillon,et al.  Modeling of a Liquid Epoxy Molding Process Using a Particle Swarm Optimization-Based Fuzzy Regression Approach , 2011, IEEE Trans. Ind. Informatics.

[50]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[51]  Sujeet Shenoi,et al.  Attack taxonomies for the Modbus protocols , 2008, Int. J. Crit. Infrastructure Prot..

[52]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.