Classification and Analysis of Computer Network Traffic

Traffic monitoring and analysis can be done for multiple different reasons: to investigate the usage of network resources, assess the performance of network applications, adjust Quality of Service (QoS) policies in the network, log the traffic to comply with the law, or create realistic models of traffic for academic purposes. We define the objective of this thesis as finding a way to evaluate the performance of various applications in a highspeed Internet infrastructure. To satisfy the objective, we needed to answer a number of research questions. The biggest extent of them concern techniques for traffic classification, which can be used for nearly real-time processing of big amounts of data using affordable CPU and memory resources. Other questions are related to methods for realtime estimation of the application Quality of Service (QoS) level based on the results obtained by the traffic classifier. This thesis is focused on topics connected with traffic classification and analysis, while the work on methods for QoS assessment is limited to defining the connections with the traffic classification and proposing a general algorithm. We introduced the already known methods for traffic classification (as by using transport layer port numbers, Deep Packet Inspection (DPI), statistical classification) and assessed their usefulness in particular areas. We found that the classification techniques based on port numbers are not accurate anymore as most applications use dynamic port numbers, while DPI is relatively slow, requires a lot of processing power, and causes a lot of privacy concerns. Statistical classifiers based on Machine Learning Algorithms (MLAs) were shown to be fast and accurate. At the same time, they do not consume a lot of resources and do not cause privacy concerns. However, they require good quality training data. We performed substantial testing of widely used DPI classifiers (PACE, OpenDPI, L7-filter, nDPI, Libprotoident, and NBAR) and assessed their usefulness in generating ground-truth, which can be used as training data for MLAs. Our evaluation showed that the most accurate classifiers (PACE, nDPI, and Libprotoident) do not provide any consistent output – the results are given on a mix of various levels: application, content, content container, service provider, or transport layer protocol. On the other hand, L7-filter and NBAR provide results consistently on the application level, however, their accuracy is too low to consider them as tools for generating the ground-truth. We also contributed to the open-source community by improving the accuracy of nDPI and designing the future enhancements to make the classification consistent. Because the existing methods were shown to not be capable of generating the proper training data, we built our own host-based system for collecting and labeling of network data, which depends on volunteers and, therefore, was called by us Volunteer-Based

[1]  J.M. Pedersen,et al.  Volunteer-based distributed traffic data collection system , 2010, 2010 The 12th International Conference on Advanced Communication Technology (ICACT).

[2]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[3]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[4]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[5]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[6]  Philippe Owezarski,et al.  On the use of Sub-Space Clustering & Evidence Accumulation for traffic analysis & classification , 2011, 2011 7th International Wireless Communications and Mobile Computing Conference.

[7]  Gerhard Haßlinger Implications of traffic characteristics on quality of service in broadband multi service networks , 2004, Proceedings. 30th Euromicro Conference, 2004..

[8]  Jens Myrup Pedersen,et al.  A Practical Method for Multilevel Classification and Accounting of Traffic in Computer Networks , 2014 .

[9]  J. M. Pedersen,et al.  A method for evaluation of quality of service in computer networks , 2013, 2013 15th International Conference on Advanced Communications Technology (ICACT).

[10]  Rastin Pries,et al.  Internet Access Traffic Measurement and Analysis , 2012, TMA.

[11]  Jesús E. Díaz-Verdejo,et al.  Performance of OpenDPI in Identifying Sampled Network Traffic , 2013, J. Networks.

[12]  Pere Barlet-Ros,et al.  Is Our Ground-Truth for Traffic Classification Reliable? , 2014, PAM.

[13]  J. M. Pedersen,et al.  A method for assessing quality of service in broadband networks , 2012, 2012 14th International Conference on Advanced Communication Technology (ICACT).

[14]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[15]  Lena Schwartz Next Generation Wireless Lans 802 11n And 802 11ac , 2016 .

[16]  Riyad Alshammari,et al.  Machine learning based encrypted traffic classification: Identifying SSH and Skype , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[17]  Thomas M. Chen,et al.  Internet performance monitoring , 2002 .

[18]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[19]  Samir Al-Khayatt,et al.  An adaptive statistical sampling technique for computer network traffic , 2010, 2010 7th International Symposium on Communication Systems, Networks & Digital Signal Processing (CSNDSP 2010).

[20]  Guochu Shou,et al.  Study of Information Network Traffic Identification Based on C4.5 Algorithm , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[21]  Luca Deri,et al.  High speed network traffic analysis with commodity multi-core systems , 2010, IMC '10.

[22]  Ying Zhang,et al.  A method for real-time peer-to-peer traffic classification based on C4.5 , 2010, 2010 IEEE 12th International Conference on Communication Technology.

[23]  Antonio Pescapè,et al.  Identification of Traffic Flows Hiding behind TCP Port 80 , 2010, 2010 IEEE International Conference on Communications.

[24]  Jens Myrup Pedersen,et al.  Obtaining Internet Flow Statistics by Volunteer-Based System , 2012, IP&C.

[25]  Jing Yuan,et al.  Information Entropy Based Clustering Method for Unsupervised Internet Traffic Classification , 2008, 2008 IEEE International Conference on Communications.

[26]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[27]  Niccolo Cascarano,et al.  iNFAnt: NFA pattern matching on GPGPU devices , 2010, CCRV.

[28]  Antonio Pescapè,et al.  TIE: A Community-Oriented Traffic Classification Platform , 2009, TMA.

[29]  Marco Canini,et al.  GTVS: Boosting the Collection of Application Traffic Ground Truth , 2009, TMA.

[30]  Jin Li,et al.  Congestion location detection: Methodology, algorithm, and performance , 2009, 2009 17th International Workshop on Quality of Service.

[31]  Zhi-Li Zhang,et al.  Adaptive random sampling for traffic load measurement , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[32]  Tung Le,et al.  Rapid Identification of BitTorrent traffic , 2010, IEEE Local Computer Network Conference.

[33]  Debmalya Panigrahi,et al.  Detecting Anomalies Using End-to-End Path Measurements , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[34]  Grenville J. Armitage,et al.  Clustering to Assist Supervised Machine Learning for Real-Time IP Traffic Classification , 2008, 2008 IEEE International Conference on Communications.

[35]  Li Jun,et al.  Internet Traffic Classification Using Machine Learning , 2007, 2007 Second International Conference on Communications and Networking in China.

[36]  Xiang Li,et al.  High accurate Internet traffic classification based on co-training semi-supervised clustering , 2010 .

[37]  Fulvio Risso,et al.  Efficient multistriding of large non-deterministic finite state automata for deep packet inspection , 2012, 2012 IEEE International Conference on Communications (ICC).

[38]  Reinhardt A. Botha,et al.  Deep packet inspection — Fear of the unknown , 2010, 2010 Information Security for South Africa.

[39]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[40]  Jesús E. Díaz-Verdejo,et al.  On the Performance of OpenDPI in Identifying P2P Truncated Flows , 2011, AP2PS 2011.

[41]  Bo Yang,et al.  Traffic classification using probabilistic neural networks , 2010, 2010 Sixth International Conference on Natural Computation.

[42]  Jens Myrup Pedersen,et al.  Classification of HTTP traffic based on C5.0 Machine Learning Algorithm , 2012, 2012 IEEE Symposium on Computers and Communications (ISCC).

[43]  Chase Cotton,et al.  Packet-level traffic measurements from the Sprint IP backbone , 2003, IEEE Netw..

[44]  David Moore,et al.  The internet measurement data catalog , 2005, CCRV.

[45]  Liu Yingqiu,et al.  Network Traffic Classification Using K-means Clustering , 2007, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007).

[46]  István Szabó,et al.  On the Validation of Traffic Classification Algorithms , 2008, PAM.

[47]  Jens Myrup Pedersen,et al.  A method for classification of network traffic based on C5.0 Machine Learning Algorithm , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[48]  Vasaka Visoottiviseth,et al.  Classification of audio and video traffic over HTTP protocol , 2009, 2009 9th International Symposium on Communications and Information Technology.

[49]  Maurizio Dusi,et al.  Quantifying the accuracy of the ground truth associated with Internet traffic traces , 2011, Comput. Networks.

[50]  Sándor Molnár,et al.  Finding Typical Internet User Behaviors , 2012, EUNICE.

[51]  Alex X. Liu,et al.  High-Speed Flow Nature Identification , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[52]  Jens Myrup Pedersen,et al.  Obtaining application-based and content-based internet traffic statistics , 2012, 2012 6th International Conference on Signal Processing and Communication Systems.

[53]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[54]  Mathilde Benveniste 'Tiered contention multiple access' (TCMA), a QoS-based distributed MAC protocol , 2002, The 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.

[55]  Richard Nelson,et al.  Measuring the accuracy of open-source payload-based traffic classifiers using popular Internet applications , 2013, 38th Annual IEEE Conference on Local Computer Networks - Workshops.

[56]  Wei Ding,et al.  Comparative Research on Internet Flows Characteristics , 2012, 2012 Third International Conference on Networking and Distributed Computing.

[57]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[58]  Kensuke Fukuda Difficulties of identifying application type in backbone traffic , 2010, 2010 International Conference on Network and Service Management.

[59]  Narendra Sharma,et al.  Comparison the various clustering algorithms of weka tools , 2012 .

[60]  Aiko Pras,et al.  Inside dropbox: understanding personal cloud storage services , 2012, Internet Measurement Conference.

[61]  Xinbo Song,et al.  An analysis of UDP traffic classification , 2010, 2010 IEEE 12th International Conference on Communication Technology.

[62]  Shunyi Zhang,et al.  Real-Time P2P Traffic Identification , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[63]  Jens Myrup Pedersen,et al.  Volunteer-Based System for Research on the Internet Traffic , 2012 .

[64]  Riyad Alshammari,et al.  Unveiling Skype encrypted tunnels using GP , 2010, IEEE Congress on Evolutionary Computation.

[65]  Naohisa Komatsu,et al.  Evaluation of HTTP video classification method using flow group information , 2010, 2010 14th International Telecommunications Network Strategy and Planning Symposium (NETWORKS).

[66]  Pere Barlet-Ros,et al.  Comparison of Deep Packet Inspection (DPI) Tools for Traffic Classification , 2013 .

[67]  Albert Cabellos-Aparicio,et al.  Analysis of the impact of sampling on NetFlow traffic classification , 2011, Comput. Networks.

[68]  Benxiong Huang,et al.  Traffic classification using an improved clustering algorithm , 2008, 2008 International Conference on Communications, Circuits and Systems.

[69]  Dario Rossi,et al.  Reviewing Traffic Classification , 2013, Data Traffic Monitoring and Analysis.

[70]  Guo Li-chao Analysis of Message Identification for OpenDPI , 2011 .

[71]  Maurizio Martinelli,et al.  nDPI: Open-source high-speed deep packet inspection , 2014, 2014 International Wireless Communications and Mobile Computing Conference (IWCMC).

[72]  Niccolo Cascarano,et al.  Optimizing Deep Packet Inspection for High-Speed Traffic Analysis , 2011, Journal of Network and Systems Management.

[73]  Pere Barlet-Ros,et al.  Extended Independent Comparison of Popular Deep Packet Inspection (DPI) Tools for Traffic Classification , 2014 .

[74]  Sven Ubik,et al.  Evaluating Application-Layer Classification Using a Machine Learning Technique over Different High Speed Networks , 2010, 2010 Fifth International Conference on Systems and Networks Communications.

[75]  Zhou Xusheng,et al.  Application of Clustering Algorithms in Ip Traffic Classification , 2009, 2009 WRI Global Congress on Intelligent Systems.

[76]  Bin Liu,et al.  An Application Traffic Classification Method Based on Semi-Supervised Clustering , 2010, 2010 2nd International Symposium on Information Engineering and Electronic Commerce.

[77]  Chaofan Shen,et al.  On Detection Accuracy of L7-filter and OpenDPI , 2012, 2012 Third International Conference on Networking and Distributed Computing.

[78]  Antonio Pescapè,et al.  Traffic classification and its applications to modern networks , 2009, Comput. Networks.

[79]  Fulvio Risso,et al.  Lightweight, Payload-Based Traffic Classification: An Experimental Evaluation , 2008, 2008 IEEE International Conference on Communications.

[80]  Giuseppe Aceto,et al.  PortLoad: Taking the Best of Two Worlds in Traffic Classification , 2010, 2010 INFOCOM IEEE Conference on Computer Communications Workshops.

[81]  Jens Myrup Pedersen,et al.  Volunteer-based system for classification of traffic in computer networks , 2011, 2011 19thTelecommunications Forum (TELFOR) Proceedings of Papers.