Traffic classification combining flow correlation and ensemble classifier

Traffic classification has wide applications in network measurements, network security and quality of service. Recent research tends to apply machine learning methods based on flow statistical features to improve traffic classification performance. In this paper, we propose a novel non-parametric approach for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. Meanwhile, our system employs a lightweight modular architecture, which combines a series of simple linear binary classifiers, each of which can be efficiently implemented and trained on vast amounts of flow data in parallel, to achieve scalability while attaining high accuracy. A large number of experiments are carried out on real traffic data to validate the proposed approach. The results show that the traffic classification performance can be improved significantly while meeting the scalability and stability requirements of large networks.

[1]  Jing Liu,et al.  Classifying peer-to-peer applications using imbalanced concept-adapting very fast decision tree on IP data stream , 2013, Peer Peer Netw. Appl..

[2]  Jill Slay,et al.  Improving the Analysis of Lawfully Intercepted Network Packet Data Captured for Forensic Analysis , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[3]  James Won-Ki Hong,et al.  Automated classifier generation for application-level mobile traffic identification , 2012, 2012 IEEE Network Operations and Management Symposium.

[4]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[6]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[7]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[8]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[9]  Jing Ma,et al.  An Empirical Investigation of Filter Attribute Selection Techniques for High-Speed Network Traffic Flow Classification , 2012, Wirel. Pers. Commun..

[10]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[11]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[13]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[14]  Krishan Kumar,et al.  QoS routing protocols for mobile ad hoc networks: a survey , 2012, Int. J. Wirel. Mob. Comput..

[15]  Marta Kwiatkowska,et al.  A biologically inspired QoS routing algorithm for mobile ad hoc networks , 2010, Int. J. Wirel. Mob. Comput..

[16]  Jin Cao,et al.  Tracking Long Duration Flows in Network Traffic , 2010, 2010 Proceedings IEEE INFOCOM.

[17]  Hari Mohan Gupta,et al.  A dynamic QoS provisioning call admission control in cellular mobile using fuzzy logic , 2012, Int. J. Wirel. Mob. Comput..

[18]  Miroslav Dudík,et al.  A maximum entropy approach to species distribution modeling , 2004, ICML.

[19]  Luca Salgarelli,et al.  Pattern Recognition Approaches for Classifying IP Flows , 2008, SSPR/SPR.

[20]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[21]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[22]  Zihui Ge,et al.  Lightweight application classification for network management , 2007, INM '07.

[23]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[24]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[25]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[26]  Marco Canini,et al.  Experience with high-speed automated application-identification for network-management , 2009, ANCS '09.

[27]  David J. Hand,et al.  Averaging Over Decision Stumps , 1994, ECML.