Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic

The consolidation of encryption and big data in network communications have made deep packet inspection no longer feasible in large networks. Early attack detection requires feature vectors which are easy to extract, process, and analyze, allowing their generation also from encrypted traffic. So far, experts have selected features based on their intuition, previous research, or acritically assuming standards, but there is no general agreement about the features to use for attack detection in a broad scope. We compared five lightweight feature sets that have been proposed in the scientific literature for the last few years, and evaluated them with supervised machine learning. For our experiments, we use the UNSW-NB15 dataset, recently published as a new benchmark for network security. Results showed three remarkable findings: (1) Analysis based on source behavior instead of classic flow profiles is more effective for attack detection; (2) meta-studies on past research can be used to establish satisfactory benchmarks; and (3) features based on packet length are clearly determinant for capturing malicious activity. Our research showed that vectors currently used for attack detection are oversized, their accuracy and speed can be improved, and are to be adapted for dealing with encrypted traffic.

[1]  El-Sayed M. El-Alfy,et al.  A multiclass cascade of artificial neural network for network intrusion detection , 2017, J. Intell. Fuzzy Syst..

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[4]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[5]  Chris Sanders,et al.  Applied Network Security Monitoring: Collection, Detection, and Analysis , 2013 .

[6]  Tanja Zseby,et al.  A Meta-Analysis Approach for Feature Selection in Network Traffic Research , 2017, Reproducibility@SIGCOMM.

[7]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[8]  Luca Salgarelli,et al.  On the stability of the information carried by traffic flow features at the packet level , 2009, CCRV.

[9]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[10]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[11]  Aiko Pras,et al.  An Overview of IP Flow-Based Intrusion Detection , 2010, IEEE Communications Surveys & Tutorials.

[12]  Tanja Zseby,et al.  Analysis of network traffic features for anomaly detection , 2014, Machine Learning.

[13]  Gianluca Bontempi,et al.  New Routes from Minimal Approximation Error to Principal Components , 2008, Neural Processing Letters.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[16]  Ciprian Dobre,et al.  Internet traffic classification based on flows' statistical properties with machine learning , 2017, Int. J. Netw. Manag..

[17]  Sean Turner,et al.  Transport Layer Security , 2014, IEEE Internet Computing.

[18]  Blake Anderson,et al.  Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity , 2017, KDD.

[19]  Jiabin Deng,et al.  A New Approach for Decision Tree Based on Principal Component Analysis , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[20]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[21]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[22]  Tanja Zseby,et al.  Pattern Discovery in Internet Background Radiation , 2019, IEEE Transactions on Big Data.

[23]  Carlton R. Davis Ipsec: Securing Vpns , 2001 .

[24]  Joachim Fabini,et al.  Botnet Communication Patterns , 2017, IEEE Communications Surveys & Tutorials.

[25]  Jill Slay,et al.  The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set , 2016, Inf. Secur. J. A Glob. Perspect..

[26]  Tanja Zseby,et al.  Time-activity footprints in IP traffic , 2016, Comput. Networks.