Near Real Time Online Flow-based Internet Traffic Classification Using Machine Learning (C4.5)

Offering reliable novel service in modern heterogeneous networks is a key challenge and an important prospective income source for many network operators and providers. Providing reliable future service in a cost effective scalable manner requires efficient use of networking and computing resources. This can be done by making the network more self enabled, i.e. making it capable of making distributed local decisions regarding the utilization of the available resources. However such decisions must be correlated in order to achieve the global overall goal (maximizing the performance and minimizing the cost) Since network administrators are always worried about making fast decisions to monitor and regulate the Internet traffic, a novel approach for online flow-based network traffic classification is proposed. This proposal is based on Machine learning algorithm C4.5 and a custom built network traffic data set captured from a university campus environment. Furthermore the aim of this effort is to build a complete online flow based traffic classification and control system. Validation on the proposed system is done from accuracy and time points of views. Firstly, an offline training and testing data sets are applied to Weka’s C4.5 and our system. And their corresponding accuracy has been compared. Our experimental results show that the accuracy is the exactly the same. Secondly, the received UDP NetFlow packets have been send to our system and to a basic packet sniffing program and the number of NetFlow packets has been counted in each. The comparison result show that no packet overwriting due to race condition.

[1]  Anja Feldmann,et al.  An analysis of Internet chat systems , 2003, IMC '03.

[2]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[3]  Gaogang Xie,et al.  Accurate Online Traffic Classification with Multi-Phases Identification Methodology , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[4]  T. C. Woo,et al.  Optimum Selection of Discrete Tolerances , 1989 .

[5]  Zihui Ge,et al.  Lightweight application classification for network management , 2007, INM '07.

[6]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[7]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[8]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[9]  Abuagla Babiker Mohd,et al.  Towards a Flow-based Internet Traffic Classification for Bandwidth Optimization , 2009 .

[10]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[11]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[12]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[13]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[14]  M. M. Sfantsikopoulos,et al.  Cost–Tolerance Function. A New Approach for Cost Optimum Machining Accuracy , 2000 .

[15]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[16]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[17]  S. M. Kannan,et al.  A new algorithm for optimum tolerance allocation of complex assemblies with alternative processes selection , 2009 .

[18]  G. R. Tang,et al.  Tolerance design for products with asymmetric quality losses , 1998 .

[19]  James Won-Ki Hong,et al.  Towards Peer-to-Peer Traffic Analysis Using Flows , 2003, DSOM.

[20]  P. Asokan,et al.  Genetic-algorithm-based optimal tolerance allocation using a least-cost model , 2004 .

[21]  Kenneth W. Chase,et al.  Tolerance Allocation Methods for Designers , 2001 .

[22]  Ching-Shin Shiau,et al.  Optimal Tolerance Allocation for a Sliding Vane Compressor , 2004 .

[23]  Liu Bin,et al.  Traffic Measurements of BitTorrent System Based on Netfilter , 2006, 2006 International Conference on Computational Intelligence and Security.

[24]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[25]  Jonathan Cagan,et al.  Use of a Quality Loss Function to Select Statistical Tolerances , 1997 .

[26]  S. Ji,et al.  Optimal Tolerance Allocation Based on Fuzzy Comprehensive Evaluation and Genetic Algorithm , 2000 .

[27]  P. Asokan,et al.  Sensitivity-based conceptual design and tolerance allocation using the continuous ants colony algorithm (CACO) , 2005 .

[28]  Abuagla Babiker Mohammed,et al.  Performance evaluation of decision tree algorithms for flow-based network traffic classification , 2008 .

[29]  Li Jun,et al.  Internet Traffic Classification Using Machine Learning , 2007, 2007 Second International Conference on Communications and Networking in China.

[30]  S. M. Kannan,et al.  Construction of closed-form equations and graphical representation for optimal tolerance allocation , 2007 .

[31]  H. Voorwald,et al.  Sugarcane bagasse cellulose/HDPE composites obtained by extrusion , 2009 .

[32]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[33]  Filippo A. Salustri,et al.  Simultaneous tolerance synthesis for manufacturing and quality , 2003 .