Clustering Techniques for Traffic Classification: A Comprehensive Review

The threat of malicious content on a network requires network administrators and users to accurately detect desirable traffic flow into their respective networks. To this effect, several studies have found it imperative to classify traffic flow, and to use traffic classification in various applications such as intrusion detection, monitoring systems, as well as pattern detection in various networks. Research into machine learning techniques of clustering emerged due to the inefficiencies and drawbacks of the traditional port-based and payload-based schemes. The classic K-means technique of clustering, in combination with other methods and parameters, can be used to build newer unsupervised and semi-supervised approaches to meliorate the quality of service in networks. In this paper, we review twelve of the existing clustering techniques. The review covers their contribution to clustering methods, the existing challenges, as well as recommendations for further research in clustering traffic flows.

[1]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[2]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[3]  Sung-Ho Yoon,et al.  Behavior signature for big data traffic identification , 2014, 2014 International Conference on Big Data and Smart Computing (BIGCOMP).

[4]  Hans-Peter Kriegel,et al.  Density-based community detection in social networks , 2011, 2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application.

[5]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[6]  Seyyed Reza Kamel,et al.  DOS intrusion attack detection by using of improved SVR , 2015, 2015 International Congress on Technology, Communication and Knowledge (ICTCK).

[7]  Zihui Ge,et al.  Lightweight application classification for network management , 2007, INM '07.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[10]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[11]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[13]  Chuan-Mu Tseng,et al.  P2P traffic classification using clustering technology , 2016, 2016 IEEE/SICE International Symposium on System Integration (SII).

[14]  Umesh R. Hodeghatta,et al.  Unsupervised Machine Learning , 2017 .

[15]  Jun Zhang,et al.  Internet Traffic Classification Using Constrained Clustering , 2014, IEEE Transactions on Parallel and Distributed Systems.

[16]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[17]  Mohammad Reza Khayyambashi,et al.  Real-Time Traffic Classification Based on Statistical and Payload Content Features , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[18]  Antonio Pescapè,et al.  Classification of Network Traffic via Packet-Level Hidden Markov Models , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[19]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[20]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[21]  Babangida Abubakar,et al.  Traffic Classification Analysis Using OMNeT , 2018 .

[22]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[23]  Min Luo,et al.  A Framework for QoS-aware Traffic Classification Using Semi-supervised Machine Learning in SDNs , 2016, 2016 IEEE International Conference on Services Computing (SCC).

[24]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[25]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[28]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[29]  Akash Garg,et al.  Identifying anomalies in network traffic using hybrid Intrusion Detection System , 2016, 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS).

[30]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[31]  Jukka-Pekka Laulajainen,et al.  Two-phased network traffic classification method for quality of service management , 2009, 2009 IEEE 13th International Symposium on Consumer Electronics.