Generalization performance analysis of flow-based peer-to-peer traffic identification

In this paper, we develop a peer-to-peer (P2P) traffic identifier to facilitate quality of service (QoS) control in edge routers. Currently, since P2P applications consume a great percentage of Internet bandwidth, certain network optimization strategies are needed to improve the network performance. Traffic identification is the most important component that could be adopted in these optimization strategies. In this paper, we focus on developing a machine learning strategy to perform quick identification, and continuous tracking of flows associated with various P2P media streaming and file sharing applications. With the use of Random Forests (RF) and evaluated by using 10-fold cross validation, our method achieves greater than 98% accuracy rate and 89% precision rate of identifying the P2P flows, with less than 1% false positive rate. With the help of winner-take-all strategy, the generalization performance of using the RF built with data collected from one network to classify flows in other networks can achieve accuracy of being over 97%, with the precision being over 81% and the FP rate being below 2%.

[1]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[2]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[3]  B. Plattner,et al.  Flow-Based Identification of P2P Heavy-Hitters , 2006, International Conference on Internet Surveillance and Protection (ICISP’06).

[4]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[5]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[6]  Li Jun,et al.  Identifying Skype Traffic by Random Forest , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[7]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[8]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.