Noise-Resistant Statistical Traffic Classification

Network traffic classification plays a significant role in cyber security applications and management scenarios. Conventional statistical classification techniques rely on the assumption that clean labelled samples are available for building classification models. However, in the big data era, mislabelled training data commonly exist due to the introduction of new applications and lack of knowledge. Existing statistical traffic classification techniques do not address the problem of mislabelled training data, so their performance become poor in the presence of mislabelled training data. To meet this challenge, in this paper, we propose a new scheme, Noise-resistant Statistical Traffic Classification (NSTC), which incorporates the techniques of noise elimination and reliability estimation into traffic classification. NSTC estimates the reliability of the remaining training data before it builds a robust traffic classifier. Through a number of traffic classification experiments on two real-world traffic data sets, the results show that the new NSTC scheme can effectively address the problem of mislabelled training data. Compared with the state of the art methods, NSTC can significantly improve the classification performance in the context of big unclean data.

[1]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[2]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[3]  Wei Tech Ang,et al.  Multistep Prediction of Physiological Tremor Based on Machine Learning for Robotics Assisted Microsurgery , 2015, IEEE Transactions on Cybernetics.

[4]  Chadi Barakat,et al.  Using host profiling to refine statistical application identification , 2012, 2012 Proceedings IEEE INFOCOM.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  Brendan Jennings,et al.  Ensemble classifier for traffic in presence of changing distributions , 2013, 2013 IEEE Symposium on Computers and Communications (ISCC).

[7]  Zahir Tari,et al.  An optimal and stable feature selection approach for traffic classification based on multi-criterion fusion , 2014, Future Gener. Comput. Syst..

[8]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[9]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[10]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[11]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[12]  Chung-Horng Lung,et al.  P2P traffic identification and optimization using fuzzy c-means clustering , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[13]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[14]  Georges Kaddoum,et al.  Survey on Threats and Attacks on Mobile Networks , 2016, IEEE Access.

[15]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[16]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[17]  Béla Hullár,et al.  Early Identification of Peer-to-Peer Traffic , 2011, 2011 IEEE International Conference on Communications (ICC).

[18]  Michalis Faloutsos,et al.  SubFlow: Towards practical flow-level traffic classification , 2012, 2012 Proceedings IEEE INFOCOM.

[19]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[20]  Chien-Liang Liu,et al.  Semi-Supervised Linear Discriminant Clustering , 2014, IEEE Transactions on Cybernetics.

[21]  Xenofontas A. Dimitropoulos,et al.  Classifying internet one-way traffic , 2012, Internet Measurement Conference.

[22]  Ian Goldberg,et al.  Enhancing Tor's performance using real-time traffic classification , 2012, CCS.

[23]  Habibullah Jamal,et al.  A Heterogeneous Service-Oriented Deep Packet Inspection and Analysis Framework for Traffic-Aware Network Management and Security Systems , 2016, IEEE Access.

[24]  Judith Kelner,et al.  Better network traffic identification through the independent combination of techniques , 2010, J. Netw. Comput. Appl..

[25]  Zhen Liu,et al.  A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion , 2015, Neurocomputing.

[26]  Xiang Li,et al.  An Internet Traffic Classification Method Based on Semi-Supervised Support Vector Machine , 2011, 2011 IEEE International Conference on Communications (ICC).

[27]  Dario Rossi,et al.  Abacus: Accurate behavioral classification of P2P-TV traffic , 2011, Comput. Networks.

[28]  Nen-Fu Huang,et al.  Application traffic classification at the early stage by characterizing application rounds , 2013, Inf. Sci..

[29]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[30]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[31]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[32]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[33]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[34]  Ming-Syan Chen,et al.  Flow Classification for Software-Defined Data Centers Using Stream Mining , 2019, IEEE Transactions on Services Computing.

[35]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[36]  Xiangjian He,et al.  Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm , 2016, IEEE Transactions on Computers.

[37]  Minyi Guo,et al.  Flexible Deterministic Packet Marking: An IP Traceback System to Find the Real Source of Attacks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[38]  Daniel Hernández-Lobato,et al.  A Double Pruning Scheme for Boosting Ensembles , 2014, IEEE Transactions on Cybernetics.

[39]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[40]  Yang Xiang,et al.  An automatic application signature construction system for unknown traffic , 2010 .

[41]  Brian Caulfield,et al.  Pervasive Sound Sensing: A Weakly Supervised Training Approach , 2016, IEEE Transactions on Cybernetics.

[42]  Elena Baralis,et al.  Hierarchical learning for fine grained internet traffic classification , 2012, 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC).