PCA-based Hotelling's T2 chart with fast minimum covariance determinant (FMCD) estimator and kernel density estimation (KDE) for network intrusion detection

Abstract In this work, the combination between the Principal Component Analysis (PCA) and the Hotelling’s T2 chart is proposed to solve problems caused by the many highly correlated network traffic features and to reduce the computational time without reducing its accuracy detection. However, a new issue arises due to the difficulty of the network traffic observations to follow the multivariate normal distribution as required in Hotelling’s T2 chart. Consequently, many false alarms are found in inspecting network intrusion detection. To solve this issue, the Kernel Density Estimation (KDE) procedure is applied to obtain an optimum control limit. Also, to improve the accuracy detection, the Fast Minimum Covariance Determinant (FMCD) is employed to estimate the robust mean vector and covariance matrix. Experiments using the simulated dataset are conducted to assess the proposed chart’s performance in detecting the presence of outlier for the normal and non-normal of multivariate data. According to the simulation studies, the proposed chart yields higher accuracy and a high detection rate with a low false alarm rate. Further, the proposed Intrusion Detection System (IDS) is utilized in scanning attacks. The reputable KDD99 data is used as the benchmark to make a fair comparison between the proposed IDS and some algorithms. The monitoring outputs show that the proposed approach produces advancements in the speed of computational time with 87.42% of time efficiency. Compared to the other charts in detecting intrusion, the proposed chart produces the lower False Negative Rate (FNR). Also, compared to some classifiers the proposed chart yields a higher accuracy at about 0.9893.

[1]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[2]  Jianbo Yu,et al.  Fault Detection Using Principal Components-Based Gaussian Mixture Model for Semiconductor Manufacturing Processes , 2011, IEEE Transactions on Semiconductor Manufacturing.

[3]  Zhang Yi,et al.  A hierarchical intrusion detection model based on the PCA neural networks , 2007, Neurocomputing.

[4]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[5]  J. Alfaro,et al.  A comparison of robust alternatives to Hotelling’s T 2 control chart , 2009 .

[6]  Jânio Sousa Santos,et al.  Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: A critical perspective , 2018 .

[7]  Min Ren,et al.  A Naive Bayesian Network Intrusion Detection Algorithm Based on Principal Component Analysis , 2015, 2015 7th International Conference on Information Technology in Medicine and Education (ITME).

[8]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[9]  Xuefeng Yan,et al.  Parallel PCA–KPCA for nonlinear process monitoring , 2018, Control Engineering Practice.

[10]  Wathiq Laftah Al-Yaseen,et al.  Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system , 2017, Expert Syst. Appl..

[11]  Gilles Mourot,et al.  An improved PCA scheme for sensor FDI: Application to an air quality monitoring network , 2006 .

[12]  Gabriel Maciá-Fernández,et al.  Hierarchical PCA-based multivariate statistical network monitoring for anomaly detection , 2016, 2016 IEEE International Workshop on Information Forensics and Security (WIFS).

[13]  H. Abdi,et al.  Principal component analysis , 2010 .

[14]  Tiago J. Rato,et al.  Statistical Process Control of Multivariate Systems with Autocorrelation , 2011 .

[15]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[16]  Douglas C. Montgomery,et al.  Introduction to Statistical Quality Control , 1986 .

[17]  Dacheng Xiu,et al.  Principal Component Analysis of High-Frequency Data , 2015, Journal of the American Statistical Association.

[18]  G. Runger Multivariate statistical process control for autocorrelated processes , 1996 .

[19]  Dedy Dwi Prastyo,et al.  Robust adaptive multivariate Hotelling's T2 control chart based on kernel density estimation for intrusion detection system , 2020, Expert Syst. Appl..

[20]  L. Jun,et al.  Comparative study of PCA approaches in process monitoring and fault detection , 2004, 30th Annual Conference of IEEE Industrial Electronics Society, 2004. IECON 2004.

[21]  Bu Sung Lee Francis,et al.  Combining MIC feature selection and feature-based MSPCA for network traffic anomaly detection , 2016, 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC).

[22]  Peter Filzmoser,et al.  Robust feature selection and robust PCA for internet traffic anomaly detection , 2012, 2012 Proceedings IEEE INFOCOM.

[23]  F. Janžekovič,et al.  PCA – A Powerful Method for Analyze Ecological Niches , 2012 .

[24]  John C. Young,et al.  THE CONTROL CHART FOR INDIVIDUAL OBSERVATIONS FROM A MULTIVARIATE NON-NORMAL DISTRIBUTION , 2001 .

[25]  Dedy Dwi Prastyo,et al.  Intrusion Detection System Using Multivariate Control Chart Hotelling's T2 Based on PCA , 2018 .

[26]  Douglas C. Montgomery,et al.  Statistical process monitoring with principal components , 1996 .

[27]  Siyang Zhang,et al.  A novel hybrid KPCA and SVM with GA model for intrusion detection , 2014, Appl. Soft Comput..

[28]  Zhen He,et al.  Performance evaluation method for network monitoring based on separable temporal exponential random graph models with application to the study of autocorrelation effects , 2020, Comput. Ind. Eng..

[29]  Morteza Saberi,et al.  Improved estimation of electricity demand function by using of artificial neural network, principal component analysis and data envelopment analysis , 2013, Comput. Ind. Eng..

[30]  Eric J. Belasco,et al.  The Health Care Access Index as a Determinant of Delayed Cancer Detection Through Principal Component Analysis , 2012 .

[31]  J. Edward Jackson,et al.  Quality Control Methods for Several Related Variables , 1959 .

[32]  Chun-Chin Hsu,et al.  Integrating independent component analysis and support vector machine for multivariate process monitoring , 2010, Comput. Ind. Eng..

[33]  Dedy Dwi Prastyo,et al.  Outlier detection using PCA mix based T2 control chart for continuous and categorical data , 2019, Commun. Stat. Simul. Comput..

[34]  Donát Magyar,et al.  Application of the Principal Component Analysis to Disclose Factors Influencing on the Composition of Fungal Consortia Deteriorating Remained Fruit Stalks on Sour Cherry Trees , 2012 .

[35]  Ferran Reverter,et al.  Kernel Methods for Dimensionality Reduction Applied to the «Omics» Data , 2012 .

[36]  Theodora Kourti,et al.  Application of latent variable methods to process control and multivariate statistical process control in industry , 2005 .

[37]  Claus Weihs,et al.  Variable window adaptive Kernel Principal Component Analysis for nonlinear nonstationary process monitoring , 2011, Comput. Ind. Eng..

[38]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[39]  Alberto Ferrer,et al.  Multivariate Statistical Process Control Based on Principal Component Analysis (MSPC-PCA): Some Reflections and a Case Study in an Autobody Assembly Process , 2007 .

[40]  Zheng Chen,et al.  Fault Detection of Drinking Water Treatment Process Using PCA and Hotelling's T2 Chart , 2009 .

[41]  Érica C. M. Nascimento,et al.  Pharmacophoric Profile: Design of New Potential Drugs with PCA Analysis , 2012 .

[42]  Farid Kadri,et al.  Improved principal component analysis for anomaly detection: Application to an emergency department , 2015, Comput. Ind. Eng..

[43]  Barry M. Wise,et al.  The process chemometrics approach to process monitoring and fault detection , 1995 .

[44]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[45]  Shilpa Lakhina,et al.  Feature Reduction using Principal Component Analysis for Effective Anomaly – Based Intrusion Detection on NSL-KDD , 2010 .

[46]  Seoung Bum Kim,et al.  Principal component analysis-based control charts for multivariate nonnormal distributions , 2013, Expert Syst. Appl..

[47]  William H. Woodall,et al.  Distribution of Hotelling's T2 Statistic Based on the Successive Differences Estimator , 2006 .

[48]  H. Hotelling,et al.  Multivariate Quality Control , 1947 .

[49]  Xiangliang Zhang,et al.  A Novel Intrusion Detection Method Based on Principle Component Analysis in Computer Security , 2004, ISNN.

[50]  Christos Georgakis,et al.  Disturbance detection and isolation by dynamic principal component analysis , 1995 .

[51]  Hazem N. Nounou,et al.  A statistical fault detection strategy using PCA based EWMA control schemes , 2013, 2013 9th Asian Control Conference (ASCC).

[52]  C. D. Jaidhar,et al.  Comparative study of Principal Component Analysis based Intrusion Detection approach using machine learning algorithms , 2015, 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN).

[53]  Haixia Xu,et al.  Adaptive kernel principal component analysis , 2010, Signal Process..

[54]  Hector Budman,et al.  Fault detection, identification and diagnosis using CUSUM based PCA , 2011 .

[55]  Julio Ortega Lopera,et al.  PCA filtering and probabilistic SOM for network intrusion detection , 2015, Neurocomputing.