A study on flow features selection for malicious activities detection in software defined networks

Properties of the SDN architecture provide new opportunities for implementation of security techniques. The possibility of collection of statistics from devices deployed over the network and passing them to a controller increases significantly the possibilities of threats detection. The collected traffic data could be processed and then used for threats detection. A system of detection of malicious activities in software defined networks Monitoring and Detection of Malicious Activities in SDN (MADMAS), introduced by the authors, is based on native mechanisms of software defined networks and uses data exploration techniques for identification and processing of features, and classification of the network traffic. In this paper, we show that an appropriate selection and processing of the flow features provides effective classification of the SDN traffic. We also demonstrate the benefits of using Independent Component Analysis (ICA) and Principal Component Analysis (PCA) techniques for features space reduction.

[1]  Michal Choras,et al.  Evolutionary-based packets classification for anomaly detection in web layer , 2016, Secur. Commun. Networks.

[2]  P. Tichavský,et al.  Efficient variant of algorithm fastica for independent component analysis attaining the cramer-RAO lower bound , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[3]  Malcolm I. Heywood,et al.  Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering , 2005 .

[4]  Age K. Smilde,et al.  Principal Component Analysis , 2003, Encyclopedia of Machine Learning.

[5]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[6]  Ewa Niewiadomska-Szynkiewicz,et al.  Cross-layer analysis of malware datasets for malicious campaigns identification , 2015, 2015 International Conference on Military Communications and Information Systems (ICMCIS).

[7]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[8]  Sandro Etalle,et al.  N-Gram against the Machine: On the Feasibility of the N-Gram Network Analysis for Binary Protocols , 2012, RAID.

[9]  Aapo Hyvärinen,et al.  Independent Component Analysis: Fast ICA by a fixed-point algorithm that maximizes non-Gaussianity , 2001 .

[10]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[11]  Aziz Mohaisen,et al.  A Survey on Deep Packet Inspection for Intrusion Detection Systems , 2008, ArXiv.

[12]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[13]  Michal Choras,et al.  A Proposal of Algorithm for Web Applications Cyber Attack Detection , 2014, CISIM.

[14]  A. Karegowda,et al.  COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION , 2010 .

[15]  Erkki Oja,et al.  Efficient Variant of Algorithm FastICA for Independent Component Analysis Attaining the CramÉr-Rao Lower Bound , 2006, IEEE Transactions on Neural Networks.

[16]  Michal Choras,et al.  Cyber Security of the Application Layer of Mission Critical Industrial Systems , 2016, CISIM.

[17]  Ewa Niewiadomska-Szynkiewicz,et al.  FP-tree and SVM for Malicious Web Campaign Detection , 2015, ACIIDS.

[18]  Konrad Wrona,et al.  SDN testbed for validation of cross-layer data-centric security policies , 2017, 2017 International Conference on Military Communications and Information Systems (ICMCIS).

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[21]  Erhan Guven,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2016, IEEE Communications Surveys & Tutorials.

[22]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[23]  Lijuan Cao,et al.  A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine , 2003, Neurocomputing.

[24]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.