A Parallelized Network Traffic Classification Based on Hidden Markov Model

This paper implemented a network traffic classification method on the basis of Guassian Mixture Model-Hidden Markov Model using packet-level properties in network traffic flows (PLGMM-HMM). Our model firstly builds PLGMM-HMMs via two packet-level properties, inter packet time and payload size, respectively; then, we construct the estimation function by computing the F-Measure value through classifying another training set using the PLGMM-HMMs. Hadoop Streaming based MapReduce has been evaluated while performing our classification experiment. Results show that our PLGMM-HMM based classification method could obtain considerable accuracy, giving out the accuracy over 90% on collected datasets, and comparatively outperforming classifiers based on HMMs with variables obeying other distributions. It is recommended that this framework could be applied to other machine learning methods as a multi-classifier template.

[1]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[2]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[3]  G. Thamaraiselvi,et al.  Data Mining: Concepts and Techniques , 2004 .

[4]  Arlo Faria,et al.  MapReduce : Distributed Computing for Machine Learning , 2006 .

[5]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[6]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[7]  Xiaohong Guan,et al.  Accurate Classification of the Internet Traffic Based on the SVM Method , 2007, 2007 IEEE International Conference on Communications.

[8]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Taesang Choi,et al.  Content-aware Internet application traffic measurement and analysis , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[11]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[12]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[13]  Alberto Dainotti,et al.  An HMM Approach to Internet Traffic Modeling , 2006 .

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[16]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[17]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[18]  Antonio Pescapè,et al.  A packet-level characterization of network traffic , 2006, 2006 11th International Workshop on Computer-Aided Modeling, Analysis and Design of Communication Links and Networks.