A Comparison of Unsupervised Learning Techniques for Encrypted Traffic Identification

The increasing use of encrypted traffic combined with non-standard port associations makes the task of traffic identification increasingly difficult. This work benchmarks the performance of five unsupervised clustering algorithms: Basic K-Means, Semi-supervised K-Means, DBSCAN, EM, and MOGA for encrypted traffic identification, specifically SSH. Results show that the performance of MOGA, a multi objective clustering approach using a Genetic Algorithm, is not only better than the others, but also provides a good trade off in terms of detection rate, false positive rate, and time to built and run the model. This is a very desirable property for a potential implementation of an encrypted traffic identification system.

[1]  Li Wei,et al.  Network Traffic Classification Using K-means Clustering , 2007 .

[2]  Rajeev Kumar,et al.  Improved Sampling of the Pareto-Front in Multiobjective Genetic Optimizations by Steady-State Evolution: A Pareto Converging Genetic Algorithm , 2002, Evolutionary Computation.

[3]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[4]  G.P.S. Junior,et al.  P2P Traffic Identification using Cluster Analysis , 2007, 2007 First International Global Information Infrastructure Symposium.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Riyad Alshammari,et al.  A flow based approach for SSH traffic detection , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[7]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[8]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[9]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[10]  Benxiong Huang,et al.  Internet Traffic Classification Using DBSCAN , 2009, 2009 WASE International Conference on Information Engineering.

[11]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[12]  Malcolm I. Heywood,et al.  An Investigation of Multi-objective Genetic Algorithms for Encrypted Traffic Identification , 2009, CISIS.

[13]  Riyad Alshammari,et al.  Investigating Two Different Approaches for Encrypted Traffic Classification , 2008, 2008 Sixth Annual Conference on Privacy, Security and Trust.

[14]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..