Robust Traffic Classification with Mislabelled Training Samples

Traffic classification plays the significant role in the network security and management. However, accurate classification is challenging if the training data is contaminated with unclean traffic. Recent researches often assume clean training data, and hence performance reduced on real-time network traffic. To meet this challenge, in this paper, we propose a robust method, Unclean Traffic Classification (UTC), which incorporates noise elimination and suspected noise reweighting. Firstly, UTC eliminates strong noisy training data identified by a consensus filtering with multiple classifiers. Furthermore, UTC estimates the relevance of remaining training data and learns a robust traffic classifier. Through a number of experiments on a real-world traffic dataset, we show that the new method outperforms existing state-of-the-art traffic classification methods, under the extremely difficult circumstance with unclean training data.

[1]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[2]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[3]  Yang Xiang,et al.  An automatic application signature construction system for unknown traffic , 2010 .

[4]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[5]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[6]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[7]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Dario Rossi,et al.  Abacus: Accurate behavioral classification of P2P-TV traffic , 2011, Comput. Networks.

[9]  Xenofontas A. Dimitropoulos,et al.  Classifying internet one-way traffic , 2012, SIGMETRICS '12.

[10]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[11]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[12]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[13]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[14]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[15]  Xiang Li,et al.  An Internet Traffic Classification Method Based on Semi-Supervised Support Vector Machine , 2011, 2011 IEEE International Conference on Communications (ICC).

[16]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[17]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[18]  Chung-Horng Lung,et al.  P2P traffic identification and optimization using fuzzy c-means clustering , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[19]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[20]  Minyi Guo,et al.  Flexible Deterministic Packet Marking: An IP Traceback System to Find the Real Source of Attacks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[21]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[22]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Béla Hullár,et al.  Early Identification of Peer-to-Peer Traffic , 2011, 2011 IEEE International Conference on Communications (ICC).

[24]  Michalis Faloutsos,et al.  SubFlow: Towards practical flow-level traffic classification , 2012, 2012 Proceedings IEEE INFOCOM.

[25]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[26]  Judith Kelner,et al.  Better network traffic identification through the independent combination of techniques , 2010, J. Netw. Comput. Appl..