NTCS: A real time flow-based network traffic classification system

This work presents the design and implementation of a real time flow-based network traffic classification system. The classifier monitor acts as a pipeline consisting of three modules: packet capture and preprocessing, flow reassembly, and classification with Machine Learning (ML). The modules are built as concurrent processes with well defined data interfaces between them so that any module can be improved and updated independently. In this pipeline, the flow reassembly function becomes the bottleneck of the performance. In this implementation, was used a efficient method of reassembly which results in a average delivery delay of 0.49 seconds, aproximately. For the classification module, the performances of the K-Nearest Neighbor (KNN), C4.5 Decision Tree, Naive Bayes (NB), Flexible Naive Bayes (FNB) and AdaBoost Ensemble Learning Algorithm are compared in order to validate our approach.

[1]  Hong-Shik Park,et al.  Game Traffic Classification Using Statistical Characteristics at the Transport Layer , 2010 .

[2]  陈宁,et al.  A Real-Time TCP Stream Reassembly Mechanism in High-Speed Network , 2009 .

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  Li Jun,et al.  Internet Traffic Classification Using Machine Learning , 2007, 2007 Second International Conference on Communications and Networking in China.

[5]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[6]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[7]  Andrea Baiocchi,et al.  Statistical classification of services tunneled into SSH connections by a K-means based learning algorithm , 2010, IWCMC.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Andrew W. Moore,et al.  Traffic Classification Using a Statistical Approach , 2005, PAM.

[10]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Majid Ahmadi,et al.  Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers , 2010, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[12]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[13]  Angela Orebaugh,et al.  Wireshark & Ethereal Network Protocol Analyzer Toolkit , 2007 .

[14]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[15]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[16]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[17]  Kuldeep Singh,et al.  Comparative analysis of five machine learning algorithms for IP traffic classification , 2011, 2011 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC).

[18]  Torsten Braun,et al.  A flow trace generator using graph-based traffic classification techniques , 2010, IWCMC.

[19]  Michael Langberg,et al.  Realtime Classification for Encrypted Traffic , 2010, SEA.

[20]  Palak Agarwal TCP Stream Reassembly and Web based GUI for Sachet IDS , 2007 .

[21]  Shun-Zheng Yu,et al.  Machine Learned Real-Time Traffic Classifiers , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[22]  K. M. M. Rao,et al.  Near Real Time Online Flow-based Internet Traffic Classification Using Machine Learning (C4.5) , 2009 .

[23]  Glen Gibb,et al.  NetFPGA: reusable router architecture for experimental research , 2008, PRESTO '08.

[24]  János Szüle,et al.  Multi-level Machine Learning Traffic Classification System , 2012, ICON 2012.

[25]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[26]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[27]  Chen Ning A Real-Time TCP Stream Reassembly Mechanism in High-Speed Network , 2009 .

[28]  Thomas Engel,et al.  Towards an Estimation of the Accuracy of TCP Reassembly in Network Forensics , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[29]  Kyungsook Han,et al.  Computational Identification of Interaction Motifs in Hepatitis C Virus NS5A and Human Proteins , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[30]  A. Nur Zincir-Heywood,et al.  An investigation on identifying SSL traffic , 2011, 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).

[31]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[32]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.