Network Traffic Classification Using Correlation Information

Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine learning techniques to flow statistical feature based classification methods. The nearest neighbor (NN)-based method has exhibited superior classification performance. It also has several important advantages, such as no requirements of training procedure, no risk of overfitting of parameters, and naturally being able to handle a huge number of classes. However, the performance of NN classifier can be severely affected if the size of training data is small. In this paper, we propose a novel nonparametric approach for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic data sets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples.

[1]  Marco Canini,et al.  Experience with high-speed automated application-identification for network-management , 2009, ANCS '09.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[5]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[6]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[7]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[8]  Guillaume Urvoy-Keller,et al.  Challenging statistical classification for operational usage: the ADSL case , 2009, IMC '09.

[9]  Marco Mellia,et al.  Mining Unclassified Traffic Using Automatic Clustering Techniques , 2011, TMA.

[10]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[11]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[12]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[13]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[14]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection Classifier for UDP Traffic , 2010, IEEE/ACM Transactions on Networking.

[15]  Keqiu Li,et al.  Modeling and Analysis of Communication Networks in Multicluster Systems under Spatio-Temporal Bursty Traffic , 2012, IEEE Transactions on Parallel and Distributed Systems.

[16]  Dario Rossi,et al.  Accurate, Fine-Grained Classification of P2P-TV Applications by Simply Counting Packets , 2009, TMA.

[17]  Jun Zhang,et al.  Image retrieval based on bag of images , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[18]  Dario Rossi,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM '07.

[19]  Luca Salgarelli,et al.  Optimizing statistical classifiers of network traffic , 2010, IWCMC.

[20]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[21]  Yang Xiang,et al.  An automatic application signature construction system for unknown traffic , 2010 .

[22]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[23]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[24]  Jun Zhang,et al.  A novel semi-supervised approach for network traffic clustering , 2011, 2011 5th International Conference on Network and System Security.

[25]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[26]  Minyi Guo,et al.  Flexible Deterministic Packet Marking: An IP Traceback System to Find the Real Source of Attacks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[27]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[28]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[29]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[30]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[31]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[32]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[33]  Jeffrey Erman,et al.  Internet Traffic Identification using Machine Learning , 2006 .

[34]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[35]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[36]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[37]  Jason Lee,et al.  A first look at modern enterprise traffic , 2005, IMC '05.

[38]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[39]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .