Automatic Application Signature Construction from Unknown Traffic

Identifying applications and classifying network traffic flows according to their source applications are critical for a broad range of network activities. Such classifications can be based on information derived from packet header fields and payload content, or statistical characteristics of flows and communication patterns of hosts. However, most of present methods rely on some forms of priori knowledge. In this paper, an application signature based traffic classification system with a novel approach to fully automate the process of deriving signatures from unknown traffic is proposed. The key idea is to combine traffic clustering based on statistical flow properties in order to generate clusters dominated by a single application on the one hand, and application signature construction solely based on payload content from each cluster on the other hand. Evaluation using real-world traffic traces indicate that the proposed approach is highly effective.

[1]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[2]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[3]  Jeffrey Erman,et al.  Internet Traffic Identification using Machine Learning , 2006 .

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[8]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[9]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[10]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[11]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[12]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[13]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[14]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[15]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[16]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[17]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[18]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.