Robust network traffic identification with unknown applications

Traffic classification is a fundamental component in advanced network management and security. Recent research has achieved certain success in the application of machine learning techniques into flow statistical feature based approach. However, most of flow statistical feature based methods classify traffic based on the assumption that all traffic flows are generated by the known applications. Considering the pervasive unknown applications in the real world environment, this assumption does not hold. In this paper, we cast unknown applications as a specific classification problem with insufficient negative training data and address it by proposing a binary classifier based framework. An iterative method is proposed to extract unknown information from a set of unlabelled traffic flows, which combines asymmetric bagging and flow correlation to guarantee the purity of extracted negatives. A binary classifier is used as an application signature which can operate on a bag of correlated flows instead of individual flows to further improve its effectiveness. We carry out a series of experiments in a real-world network traffic dataset to evaluate the proposed methods. The results show that the proposed method significantly outperforms the-state-of-art traffic classification methods under the situation of unknown applications present.

[1]  Jun Zhang,et al.  A novel semi-supervised approach for network traffic clustering , 2011, 2011 5th International Conference on Network and System Security.

[2]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[3]  Xenofontas A. Dimitropoulos,et al.  Classifying internet one-way traffic , 2012, SIGMETRICS '12.

[4]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[5]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[6]  Dario Rossi,et al.  Detailed Analysis of Skype Traffic , 2009, IEEE Transactions on Multimedia.

[7]  Minyi Guo,et al.  Flexible Deterministic Packet Marking: An IP Traceback System to Find the Real Source of Attacks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[8]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Dario Rossi,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM '07.

[11]  Jiawei Han,et al.  PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Athanasios V. Vasilakos,et al.  DTRAB: Combating Against Attacks on Encrypted Protocols Through Traffic-Feature Analysis , 2010, IEEE/ACM Transactions on Networking.

[13]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[14]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[15]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[16]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[17]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[18]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[19]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[20]  Jun Zhang,et al.  Content Based Image Retrieval Using Unclean Positive Examples , 2009, IEEE Transactions on Image Processing.

[21]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[22]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[23]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[24]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection Classifier for UDP Traffic , 2010, IEEE/ACM Transactions on Networking.

[25]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.