Network traffic classification — A comparative study of two common decision tree methods: C4.5 and Random forest

Network traffic classification gains continuous interesting while many applications emerge on the different kinds of networks with obfuscation techniques. Decision tree is a supervised machine learning method used widely to identify and classify network traffic. In this paper, we introduce a comparative study focusing on two common decision tree methods namely: C4.5 and Random forest. The study offers comparative results in two different factors are accuracy of classification and processing time. C4.5 achieved high percentage of classification accuracy reach to 99.67 for 24000 instances while Random Forest was faster than C4.5 in term of processing time.

[1]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[2]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[3]  A. Callado,et al.  A Survey on Internet Traffic Identification and Classification , 2022 .

[4]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[5]  Judith Kelner,et al.  Better network traffic identification through the independent combination of techniques , 2010, J. Netw. Comput. Appl..

[6]  Li Wei,et al.  Network Traffic Classification Using K-means Clustering , 2007 .

[7]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Khaled Rasheed,et al.  Decision tree and ensemble learning algorithms with their applications in bioinformatics. , 2011, Advances in experimental medicine and biology.

[10]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[11]  Liming Cai,et al.  Operon Prediction in Microbial Genomes Using Decision Tree Approach , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[12]  C.-C. Jay Kuo,et al.  Internet Traffic Classification for Scalable QOS Provision , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[13]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[14]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[15]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Xiaohong Guan,et al.  Accurate Classification of the Internet Traffic Based on the SVM Method , 2007, 2007 IEEE International Conference on Communications.

[18]  Liu Yingqiu,et al.  Network Traffic Classification Using K-means Clustering , 2007, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007).

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[21]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[22]  Giorgio Valentini,et al.  Ensembles in Machine Learning Applications , 2011, Studies in Computational Intelligence.

[23]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[24]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .