Classification of Correlated Internet Traffic Flows

A critical problem for Internet traffic classification is how to obtain a high-performance statistical feature based classifier using a small set of training data. The solutions to this problem are essential to deal with the encrypted applications and the new emerging applications. In this paper, we propose a new Naive Bayes (NB) based classification scheme to tackle this problem, which utilizes two recent research findings, feature discretization and flow correlation. A new bag-of-flow (BoF) model is firstly introduced to describe the correlated flows and it leads to a new BoF-based traffic classification problem. We cast the BoF-based traffic classification as a specific classifier combination problem and theoretically analyze the classification benefit from flow aggregation. A number of combination methods are also formulated and used to aggregate the NB predictions of the correlated flows. Finally, we carry out a number of experiments on a large scale real-world network dataset. The experimental results show that the proposed scheme can achieve significantly higher classification accuracy and much faster classification speed with comparison to the state-of-the-art traffic classification methods.

[1]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[2]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[3]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[4]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Dario Rossi,et al.  Accurate, Fine-Grained Classification of P2P-TV Applications by Simply Counting Packets , 2009, TMA.

[6]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[7]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[8]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[9]  Dario Rossi,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM '07.

[10]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[13]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[16]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[17]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[18]  Jun Zhang,et al.  A novel semi-supervised approach for network traffic clustering , 2011, 2011 5th International Conference on Network and System Security.

[19]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[20]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[21]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[22]  Marco Canini,et al.  Experience with high-speed automated application-identification for network-management , 2009, ANCS '09.