Multi-level Machine Learning Traffic Classification System

In this paper, we propose a novel framework for traffic classification that employs machine learning techniques and uses only packet header information. The framework consists of a number of key components. First, we use an efficient combination of clustering and classification algorithms to make the identification system robust in various network conditions. Second, we introduce traffic granularity levels and propagate information between the levels to increase accuracy and accelerate classification. Third, we use customized constraints based on connection patterns to efficiently utilize state-of-theart clustering algorithms. The components of the framework are evaluated step-by-step to examine their contribution to the performance of the whole system. Keywords-traffic classification; machine learning; packet header

[1]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[2]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[3]  W. Bastiaan Kleijn,et al.  Feature Selection Under a Complexity Constraint , 2009, IEEE Transactions on Multimedia.

[4]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[5]  Guillaume Urvoy-Keller,et al.  Challenging statistical classification for operational usage: the ADSL case , 2009, IMC '09.

[6]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[7]  Levent Ertoz,et al.  A New Shared Nearest Neighbor Clustering Algorithm and its Applications , 2002 .

[8]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[9]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[10]  Michael Langberg,et al.  Realtime Classification for Encrypted Traffic , 2010, SEA.

[11]  Erik Hjelmvik,et al.  Statistical Protocol IDentification with SPID: Preliminary Results , 2009 .

[12]  Peter Cheeseman,et al.  Bayesian classification theory , 1991 .

[13]  Pablo Belzarena,et al.  Early traffic classification using support vector machines , 2009, LANC.

[14]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[15]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[16]  He Deng,et al.  A P2P Network Traffic Classification Method Using SVM , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[17]  Francesco Palmieri,et al.  A nonlinear, recurrence-based approach to traffic classification , 2009, Comput. Networks.

[18]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[21]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[22]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[23]  Li Jun,et al.  Identifying Skype Traffic by Random Forest , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[24]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.