Network Traffic Classification with Improved Random Forest

Accurate network traffic classification is significant to numerous network activities, such as QoS and network management etc. While port-based or payload-based classification methods are becoming more and more difficult, Machine Learning methods are promising in many aspects. In this paper, we improve the standard Random Forest by setting the variable selection probability according to the importance of the corresponding variable to classify network traffic. Our test results show that the Improved Random Forest has better classification performance. And it takes less time to build the model.

[1]  Luca Salgarelli,et al.  On-line SVM traffic classification , 2011, 2011 7th International Wireless Communications and Mobile Computing Conference.

[2]  Hongbo Liu,et al.  The Internet Traffic Classification an Online SVM Approach , 2008, 2008 International Conference on Information Networking.

[3]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[4]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[5]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[8]  Fulvio Risso,et al.  Lightweight, Payload-Based Traffic Classification: An Experimental Evaluation , 2008, 2008 IEEE International Conference on Communications.

[9]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[10]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[11]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[12]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[13]  Li Jun,et al.  Identifying Skype Traffic by Random Forest , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[14]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[15]  Zihui Ge,et al.  Lightweight application classification for network management , 2007, INM '07.

[16]  He Deng,et al.  A P2P Network Traffic Classification Method Using SVM , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.