Retraining Mechanism for On-Line Peer-to-Peer Traffic Classification

Peer-to-Peer (P2P) detection using machine learning (ML) classification is affected by its training quality and recency. In this paper, a practical retraining mechanism is proposed to retrain an on-line P2P ML classifier with the changes in network traffic behavior. This mechanism evaluates the accuracy of the on-line P2P ML classifier based on the training datasets containing flows labeled by a heuristic based training dataset generator. The on-line P2P ML classifier is retrained if its accuracy falls below a predefined threshold. The proposed system has been evaluated on traces captured from the Universiti Teknologi Malaysia (UTM) campus network between October and November 2011. The overall results shows that the training dataset generation can generate accurate training dataset by classifying P2P flows with high accuracy (98.47%) and low false positive (1.37%). The on-line P2P ML classifier which is built based on J48 algorithm which has been demonstrated to be capable of self-retraining over time.

[1]  Max E. Fuller THE COMMUNICATIONS TEACHER ASKS SOME QUESTIONS , 1951 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  David Moore,et al.  The CoralReef Software Suite as a Tool for System and Network Administrators , 2001, LISA.

[4]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[5]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[6]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[7]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2004, IEEE/ACM Trans. Netw..

[8]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[9]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[10]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[11]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[12]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[13]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[14]  Yong Guan,et al.  An Adaptive Reputation-based Trust Framework for Peer-to-Peer Applications , 2005, Fourth IEEE International Symposium on Network Computing and Applications.

[15]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[16]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[17]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[18]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[19]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[20]  Sándor Molnár,et al.  Identification and Analysis of Peer-to-Peer Traffic , 2006, J. Commun..

[21]  Bijan Raahemi,et al.  Peer-to-Peer IP Traffic Classification Using Decision Tree and IP Layer Attributes , 2007, Int. J. Bus. Data Commun. Netw..

[22]  Bijan Raahemi,et al.  Classification of Peer-to-Peer Traffic Using Neural Networks , 2007, Artificial Intelligence and Pattern Recognition.

[23]  Michalis Faloutsos,et al.  Profiling the End Host , 2007, PAM.

[24]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[25]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[26]  István Szabó,et al.  Accurate Traffic Classification , 2007, 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[27]  Michalis Faloutsos,et al.  Comparison of Internet Traffic Classification Tools , 2007 .

[28]  Gaogang Xie,et al.  Accurate Online Traffic Classification with Multi-Phases Identification Methodology , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[29]  Wolfgang John,et al.  Heuristics to Classify Internet Backbone Traffic based on Connection Patterns , 2008, 2008 International Conference on Information Networking.

[30]  Shunyi Zhang,et al.  Real-Time P2P Traffic Identification , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[31]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[32]  Shun-Zheng Yu,et al.  Machine Learned Real-Time Traffic Classifiers , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[33]  Ali A. Ghorbani,et al.  Hybrid Traffic Classification Approach Based on Decision Tree , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[34]  Xiaohong Huang,et al.  A Dynamic Online Traffic Classification Methodology Based on Data Stream Mining , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[35]  Bo Yang,et al.  Online hybrid traffic classifier for Peer-to-Peer systems based on network processors , 2009, Appl. Soft Comput..

[36]  Min Zhang,et al.  State of the Art in Traffic Classification: A Research Review , 2009 .

[37]  Eng Keong Lua,et al.  P2p Networking And Applications , 2009 .

[38]  Chen-Nee Chuah,et al.  A novel self-learning architecture for p2p traffic classification in high speed networks , 2010, Comput. Networks.

[39]  M. N. Marsono,et al.  A three-class heuristics technique: Generating training corpus for Peer-to-Peer traffic classification , 2010, 2010 IEEE 4th International Conference on Internet Multimedia Services Architecture and Application.

[40]  Chen Hong,et al.  Research on a method of P2P traffic identification based on multi-dimension characteristics , 2010, 2010 5th International Conference on Computer Science & Education.

[41]  Ke Xu,et al.  Identify P2P traffic by inspecting data transfer behavior , 2010, Comput. Commun..

[42]  Ece Guran Schmidt,et al.  Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison , 2010, Perform. Evaluation.

[43]  Shunyi Zhang,et al.  Realtime Encrypted Traffic Identification using Machine Learning , 2011, J. Softw..

[44]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .