Multiple vector classification for P2P traffic identification

The identification of P2P traffic has become a principal concern for the research community in the last years. Although several P2P traffic identification proposals can be found in the specialized literature, the problem still persists mainly due to obfuscation and privacy matters. This paper presents a flow-based P2P traffic identification scheme which is based on a multiple classification procedure. First, every traffic flow monitored is parameterized by using three different groups of features: time related features, data transfer features and signalling features. After that, a flow identification process is performed for each group of features. Finally, a global identification procedure is carried out by combining the three individual classifications. Promising experimental results have been obtained by using a basic KNN scheme as the classifier. These results provide some insights on the relevance of the group of features considered and demonstrate the validity of our approach to identify P2P traffic in a reliable way, while content inspection is avoided.

[1]  Raimir Holanda Filho,et al.  Using Statistical Discriminators and Cluster Analysis to P2P and Attack Traffic Monitoring , 2007, 2007 Latin American Network Operations and Management Symposium.

[2]  Gu Yiran,et al.  Traffic Identification Method for Specific P2P Based on Multilayer Tree Combination Classification by BP-LVQ Neural-Network , 2010, 2010 International Forum on Information Technology and Applications.

[3]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[4]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[5]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.

[6]  Antonio M. Peinado,et al.  Multiple VQ hidden Markov modelling for speech recognition , 1994, Speech Commun..

[7]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  Xin Zhou,et al.  Study of Double-Characteristics-Based SVM Method for P2P Traffic Identification , 2010, 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing.

[10]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2004, IEEE/ACM Trans. Netw..

[11]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[12]  Xiaohong Guan,et al.  An SVM-based machine learning method for accurate internet traffic classification , 2010, Inf. Syst. Frontiers.

[13]  Chen-Nee Chuah,et al.  A novel self-learning architecture for p2p traffic classification in high speed networks , 2010, Comput. Networks.

[14]  Zhang Yan,et al.  Connection Pattern-Based P2P Application Identification Characteristic , 2007, 2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007).

[15]  Judith Kelner,et al.  Better network traffic identification through the independent combination of techniques , 2010, J. Netw. Comput. Appl..

[16]  José María Gómez Hidalgo,et al.  Evaluating cost-sensitive Unsolicited Bulk Email categorization , 2002, SAC '02.

[17]  Jiang Pei,et al.  A New P2P Traffic Identification Model Based on Node Status , 2010, 2010 International Conference on Management and Service Science.

[18]  Chen-Nee Chuah,et al.  Author ' s personal copy A novel self-learning architecture for p 2 p traffic classification in high speed networks , 2010 .

[19]  Ece Guran Schmidt,et al.  Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison , 2010, Perform. Evaluation.

[20]  Li Xiaojuan,et al.  A P2P network traffic identification model based on heuristic rules , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).