A hybrid heuristics-statistical peer-to-peer traffic classifier

Peer-to-peer (P2P) traffic consumes a significant chunk of Internet bandwidth that requires effective control. This work proposes a novel hybrid heuristics-statistical approach to classify P2P traffic. Heuristics approach provides highly accurate P2P detection, although it involves measuring and analyzing of many correlations between packets and flows for certain duration of time, which make it inapplicable for online P2P traffic classification. On the other hand, statistical classification can classify traffic in an online manner although it needs periodical, often manual, retraining. The proposed hybrid solution merges these two approaches: offline heuristics learning corpus generation and online statistical classification. In the first part, heuristics are used to classify traffic flows into three classes, two which are later used for training the online statistical classifier. This work presents an enhancement on the existing heuristics P2P classification by adding a new class for unknown traffic. Analyses on the offline traces using the improved heuristics show that the addition of the third class reduces the class noise from 7% to 2%, hence, providing quality examples to retrain the online statistical classifier. For the second part, machine learning (ML) algorithms are used to classify traffic on the fly based on the flows and packets statistics. Using examples generated by the heuristics classifier, the overall statistical classification accuracy is 99% based on analysis on downloaded and captured traces.

[1]  Juan Chen,et al.  A P2P Traffic Detection Method Based on Support Vector Machine , 2012 .

[2]  Eng Keong Lua,et al.  P2p Networking And Applications , 2009 .

[3]  M. N. Marsono,et al.  A three-class heuristics technique: Generating training corpus for Peer-to-Peer traffic classification , 2010, 2010 IEEE 4th International Conference on Internet Multimedia Services Architecture and Application.

[4]  Chen-Nee Chuah,et al.  A novel self-learning architecture for p2p traffic classification in high speed networks , 2010, Comput. Networks.

[5]  Jing Liu,et al.  Classifying peer-to-peer applications using imbalanced concept-adapting very fast decision tree on IP data stream , 2013, Peer Peer Netw. Appl..

[6]  Yan Ma,et al.  Real-time feature selection in traffic classification , 2008 .

[7]  Ke Xu,et al.  Identify P2P Traffic by Inspecting Data Transfer Behaviour , 2009, Networking.

[8]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[9]  Ke Xu,et al.  Identify P2P traffic by inspecting data transfer behavior , 2010, Comput. Commun..

[10]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[11]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[12]  Min Zhang,et al.  State of the Art in Traffic Classification: A Research Review , 2009 .

[13]  Sándor Molnár,et al.  Identification and Analysis of Peer-to-Peer Traffic , 2006, J. Commun..

[14]  Wolfgang John,et al.  Heuristics to Classify Internet Backbone Traffic based on Connection Patterns , 2008, 2008 International Conference on Information Networking.