Group-Wise Principal Component Analysis for Exploratory Intrusion Detection

Intrusion detection is a relevant layer of cybersecurity to prevent hacking and illegal activities from happening on the assets of corporations. Anomaly-based Intrusion Detection Systems perform an unsupervised analysis on data collected from the network and end systems, in order to identify singular events. While this approach may produce many false alarms, it is also capable of identifying new (zero-day) security threats. In this context, the use of multivariate approaches such as Principal Component Analysis (PCA) provided promising results in the past. PCA can be used in exploratory mode or in learning mode. Here, we propose an exploratory intrusion detection that replaces PCA with Group-wise PCA (GPCA), a recently proposed data analysis technique with additional exploratory characteristics. A main advantage of GPCA over PCA is that the former yields simple models, easy to understand by security professionals not trained in multivariate tools. Besides, the workflow in the intrusion detection with GPCA is more coherent with dominant strategies in intrusion detection. We illustrate the application of GPCA in two case studies.

[1]  Gabriel Maciá-Fernández,et al.  Evaluation of diagnosis methods in PCA-based Multivariate Statistical Process Control , 2018 .

[2]  Gabriel Maciá-Fernández,et al.  Hierarchical PCA-based multivariate statistical network monitoring for anomaly detection , 2016, 2016 IEEE International Workshop on Information Forensics and Security (WIFS).

[3]  S. Joe Qin,et al.  Analysis and generalization of fault diagnosis methods for process monitoring , 2011 .

[4]  Erhan Guven,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2016, IEEE Communications Surveys & Tutorials.

[5]  Jennifer Rexford,et al.  Sensitivity of PCA for traffic anomaly detection , 2007, SIGMETRICS '07.

[6]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[7]  B. Surendiran,et al.  Dimensionality reduction using Principal Component Analysis for network intrusion detection , 2016 .

[8]  Nola D. Tracy,et al.  Multivariate Control Charts for Individual Observations , 1992 .

[9]  José Camacho,et al.  Observation‐based missing data methods for exploratory data analysis to unveil the connection between observations and variables in latent subspace models , 2011 .

[10]  Joel J. P. C. Rodrigues,et al.  Network anomaly detection using IP flows with Principal Component Analysis and Ant Colony Optimization , 2016, J. Netw. Comput. Appl..

[11]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[12]  Yuan Luo,et al.  Recent Advances in Supervised Dimension Reduction: A Survey , 2019, Mach. Learn. Knowl. Extr..

[13]  Baijian Yang,et al.  Dimension reduction for big data , 2018 .

[14]  Maurizio Mongelli,et al.  Profiling DNS tunneling attacks with PCA and mutual information , 2016, Log. J. IGPL.

[15]  Gabriel Maciá-Fernández,et al.  Traffic Monitoring and Diagnosis with Multivariate Statistical Network Monitoring: A Case Study , 2017, 2017 IEEE Security and Privacy Workshops (SPW).

[16]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[17]  Roberto Therón,et al.  UGR'16: A new dataset for the evaluation of cyclostationarity-based network IDSs , 2018, Comput. Secur..

[18]  Gabriel Maciá-Fernández,et al.  Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle , 2019, Comput. Secur..

[19]  Edoardo Saccenti,et al.  Group-Wise Principal Component Analysis for Exploratory Data Analysis , 2017 .

[20]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[21]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[22]  Roberto Therón,et al.  Network-wide intrusion detection supported by multivariate analysis and interactive visualization , 2017, 2017 IEEE Symposium on Visualization for Cyber Security (VizSec).

[23]  Eiji Okamoto,et al.  Multivariate statistical analysis of network traffic for intrusion detection , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[24]  Chunhua Su,et al.  Enhancing Trust Management for Wireless Intrusion Detection via Traffic Sampling in the Era of Big Data , 2018, IEEE Access.

[25]  Lidong Wang,et al.  Big Data Analytics for Network Intrusion Detection: A Survey , 2017 .

[26]  Bu-Sung Lee,et al.  Detection of network anomalies using Improved-MSPCA with sketches , 2017, Comput. Secur..

[27]  José Camacho,et al.  Missing-data theory in the context of exploratory data analysis , 2010 .

[28]  Victor C. M. Leung,et al.  Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data , 2018, IEEE Access.

[29]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[30]  Gabriel Maciá-Fernández,et al.  Tackling the Big Data 4 vs for anomaly detection , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[31]  Devesh Kumar Srivastava,et al.  Network Intrusion Detection in Big Dataset Using Spark , 2018 .

[32]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[33]  Cheng Yao,et al.  Multi‐scale anomaly detection for high‐speed network traffic , 2015, Trans. Emerg. Telecommun. Technol..

[34]  J. Macgregor,et al.  Monitoring batch processes using multiway principal component analysis , 1994 .

[35]  Taghi M. Khoshgoftaar,et al.  Intrusion detection and Big Heterogeneous Data: a Survey , 2015, Journal of Big Data.

[36]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[37]  Theodora Kourti,et al.  Multivariate SPC Methods for Process and Product Monitoring , 1996 .

[38]  Mark Crovella,et al.  Characterization of network-wide anomalies in traffic flows , 2004, IMC '04.

[39]  Ali Bou Nassif,et al.  Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection , 2019, Comput. Networks.