An automatic application signature construction system for unknown traffic

Identifying applications and classifying network traffic flows according to their source applications are critical for a broad range of network activities. Such a decision can be based on packet header fields, packet payload content, statistical characteristics of traffic and communication patterns of network hosts. However, most present techniques rely on some sort of a priori knowledge, which means they require labor-intensive preprocessing before running and cannot deal with previously unknown applications. In this paper, we propose a traffic classification system based on application signatures, with a novel approach to fully automate the process of deriving signatures from unidentified traffic. The key idea is to integrate statistics-based flow clustering with payload-based signature matching method, so as to eliminate the requirement of pre-labeled training data sets. We evaluate the efficiency of our approach using real-world traffic trace, and the results indicate that signature classifiers built from clustered data and pre-labeled data are able to achieve similar high accuracy better than 99p. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[4]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[5]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[6]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[7]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[8]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[9]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[10]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[11]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[12]  Jeffrey Erman,et al.  Internet Traffic Identification using Machine Learning , 2006 .

[13]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[14]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[15]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[16]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.