Using per-Source measurements to improve performance of Internet traffic classification

Obfuscated and encrypted protocols hinder traffic classification by classical techniques such as port analysis or deep packet inspection. Therefore, there is growing interest for classification algorithms based on statistical analysis of the length of the first packets of flows. Most classifiers proposed in literature are based on machine learning techniques and consider each flow independently of previous source activity (per-flow analysis). In this paper, we propose to use specific per-source information to improve classification accuracy: the sequence of starting times of flows generated by single sources may be analyzed along time to estimate peculiar statistical parameters, in our case the exponent α of the power law ƒ−α that approximates the PSD of their counting process. In our method, this measurement is used to train a classifier in addition to the lengths of the first packets of the flows. In our experiments, considering this additional per-source information yielded the same accuracy as using only per-flow data, but observing fewer packets in each flow and thus allowing a quicker response. For the proposed classifier, we report performance evaluation results obtained on sets of Internet traffic traces collected in three sites.

[1]  Giacomo Verticale,et al.  Performance evaluation of a machine learning algorithm for early application identification , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[2]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[3]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[4]  P. Lesage,et al.  Characterization of Frequency Stability: Analysis of the Modified Allan Variance and Properties of Its Estimate , 1984, IEEE Transactions on Instrumentation and Measurement.

[5]  Martin Köhn,et al.  Architecture and scalability of a high-speed traffic measurement platform with a highly flexible packet classification , 2009, Comput. Networks.

[6]  L. G. Bernier,et al.  Theoretical Analysis of the Modified Allan Variance , 1987, 41st Annual Symposium on Frequency Control.

[7]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[8]  Stefano Bregni,et al.  Accurate estimation of the Hurst parameter of long-range dependent traffic using modified Allan and Hadamard variances , 2008, IEEE Transactions on Communications.

[9]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[12]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[13]  Stefano Bregni Characterization and Modelling of Clocks , 2002 .

[14]  Nigel Williams netAI: network traffic based application identifier , 2006 .

[15]  D. W. Allan,et al.  A Modified "Allan Variance" with Increased Oscillator Characterization Ability , 1981 .

[16]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[17]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[20]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[21]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.