On Accuracy of Early Traffic Classification

The widely employment of traffic encryption, tunneling and other protection/obfuscation mechanisms in modern network applications, prompts the emergence of traffic behavior (i.e., packet direction pattern, size, and inter-arrival time) based classification approaches. Some proposals even demonstrate its potential for on-line early traffic classification - using the first 4-6 data packets at the beginning of a TCP connection to identify the corresponding application. Nevertheless, the related accuracy issues on early classification are still unclear when forged packets exist. The performance of such mechanism under malicious environment, where sophisticated forged data packets injection techniques are presented, had not been addressed. This work aims to touch the above issues, especially when forged packets are inserted before actual application transaction started. Our contributions are two-folded: (1) confirm the discrimination power of early classification as revealed by previous study; (2) explore it's accuracy vulnerability to forged packets - the experiments on both simulated and real SSH tunnel traces show the accuracy declines when forged packets are injected. Our findings show that the intellective early classification methods still deserve further investigation before actual deployment.

[1]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[2]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[3]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[4]  Giacomo Verticale,et al.  Performance evaluation of a machine learning algorithm for early application identification , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[5]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[6]  Antonio Pescapè,et al.  Identification of Traffic Flows Hiding behind TCP Port 80 , 2010, 2010 IEEE International Conference on Communications.

[7]  Giacomo Verticale,et al.  On the Portability of Trained Machine Learning Classifiers for Early Application Identification , 2008, 2008 Second International Conference on Emerging Security Information, Systems and Technologies.

[8]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[9]  Pablo Belzarena,et al.  Early traffic classification using support vector machines , 2009, LANC.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[12]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Maurizio Dusi,et al.  Using GMM and SVM-Based Techniques for the Classification of SSH-Encrypted Traffic , 2009, 2009 IEEE International Conference on Communications.

[15]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[16]  Antonio Pescapè,et al.  Early Classification of Network Traffic through Multi-classification , 2011, TMA.