Semi-supervised and Compound Classification of Network Traffic

This paper presents a new semi-supervised method to effectively improve traffic classification performance when few supervised training data are available. Existing semi supervised methods label a large proportion of testing flows as unknown flows due to limited supervised information, which severely affects the classification performance. To address this problem, we propose to incorporate flow correlation into both training and testing stages. At the training stage, we make use of flow correlation to extend the supervised data set by automatically labeling unlabeled flows according to their correlation to the pre-labeled flows. Consequently, the traffic classifier has better performance due to the extended size and quality of the supervised data sets. At the testing stage, the correlated flows are identified and classified jointly by combining their individual predictions, so as to further boost the classification accuracy. The empirical study on the real-world network traffic shows that the proposed method outperforms the state-of-the-art flow statistical feature based classification methods.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[3]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[4]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[5]  Yang Xiang,et al.  An automatic application signature construction system for unknown traffic , 2010 .

[6]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[7]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[8]  Marco Mellia,et al.  Mining Unclassified Traffic Using Automatic Clustering Techniques , 2011, TMA.

[9]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[10]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[11]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[12]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[13]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[14]  Jun Zhang,et al.  Internet traffic clustering with constraints , 2012, 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC).

[15]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[16]  Marco Canini,et al.  Experience with high-speed automated application-identification for network-management , 2009, ANCS '09.

[17]  Marco Mellia,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM 2007.

[18]  Jun Zhang,et al.  A novel semi-supervised approach for network traffic clustering , 2011, 2011 5th International Conference on Network and System Security.

[19]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection Classifier for UDP Traffic , 2010, IEEE/ACM Transactions on Networking.

[20]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[21]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[22]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[23]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[24]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[25]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[26]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[27]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[28]  Dario Rossi,et al.  Stochastic Packet Inspection for TCP Traffic , 2010, 2010 IEEE International Conference on Communications.

[29]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[30]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.