Mining Unclassified Traffic Using Automatic Clustering Techniques

In this paper we present a fully unsupervised algorithm to identify classes of traffic inside an aggregate. The algorithm leverages on the K-means clustering algorithm, augmented with a mechanism to automatically determine the number of traffic clusters. The signatures used for clustering are statistical representations of the application layer protocols. The proposed technique is extensively tested considering UDP traffic traces collected from operative networks. Performance tests show that it can clusterize the traffic in few tens of pure clusters, achieving an accuracy above 95%. Results are promising and suggest that the proposed approach might effectively be used for automatic traffic monitoring, e.g., to identify the birth of new applications and protocols, or the presence of anomalous or unexpected traffic.

[1]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[2]  Dario Rossi,et al.  Network Awareness of P2P Live Streaming Applications: A Measurement Study , 2010, IEEE Transactions on Multimedia.

[3]  Dario Rossi,et al.  Stochastic Packet Inspection for TCP Traffic , 2010, 2010 IEEE International Conference on Communications.

[4]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[5]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[6]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[7]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[8]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[9]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[10]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[11]  G. Mardente,et al.  Web User-Session Inference by Means of Clustering Techniques , 2009, IEEE/ACM Transactions on Networking.

[12]  Changjia Chen,et al.  Analysis of UDP Traffic Usage on Internet Backbone Links , 2009, 2009 Ninth Annual International Symposium on Applications and the Internet.

[13]  Jing Yuan,et al.  Information Entropy Based Clustering Method for Unsupervised Internet Traffic Classification , 2008, 2008 IEEE International Conference on Communications.

[14]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[15]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection Classifier for UDP Traffic , 2010, IEEE/ACM Transactions on Networking.

[16]  Yang Xiang,et al.  An automatic application signature construction system for unknown traffic , 2010 .

[17]  Dario Rossi,et al.  Live Traffic Monitoring with Tstat: Capabilities and Experiences , 2010, WWIC.

[18]  Jeffrey Erman,et al.  Internet Traffic Identification using Machine Learning , 2006 .