Large traffic flows classification method

To Ensure QoE (quality of experience) to the users when they access so many Internet applications every day, ISPs are faced with challenge and opportunity in bandwidth management. They need some ways to identify each application's flows generated by user hosts, especially the application classes with large flows because of the higher bandwidth occupation comparing with the other classes with small flows. A novel method is presented to modularize flow size using information gain ratio. The origin dataset is properly partitioned into large flow and small flow subsets by a threshold that is achieved when the data complexity of large flow subset is minimized. The searching algorithm of the partitioned threshold is independent of classification performance. The specific classifiers can be trained to identify large flows and small flows properly on each subset in generalization. Experimental results on real world traffic datasets show that byte accuracy increased 30% averagely when our method is compared with original.

[1]  Jun Zhang,et al.  Internet Traffic Classification by Aggregating Correlated Naive Bayes Predictions , 2023, IEEE Transactions on Information Forensics and Security.

[2]  Albert Cabellos-Aparicio,et al.  Analysis of the impact of sampling on NetFlow traffic classification , 2011, Comput. Networks.

[3]  Martín Varela,et al.  QoE-driven network management for real-time over-the-top multimedia services , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[4]  Francisco Herrera,et al.  Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling , 2011, Soft Comput..

[5]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[6]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[7]  Bo Fu,et al.  QoE-based transport optimization for video delivery over next generation cellular networks , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[8]  Zhen Liu,et al.  Balanced feature selection method for Internet traffic classification , 2012, IET Networks.

[9]  Haitao He,et al.  Improve Flow Accuracy and Byte Accuracy in Network Traffic Classification , 2008, ICIC.

[10]  Rastin Pries,et al.  Internet Access Traffic Measurement and Analysis , 2012, TMA.

[11]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[12]  Anirban Mahanti,et al.  Byte me: a case for byte accuracy in traffic classification , 2007, MineNet '07.

[13]  Stan Matwin,et al.  Network traffic classification using AdaBoost Dynamic , 2013, 2013 IEEE International Conference on Communications Workshops (ICC).

[14]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[15]  Núria Macià Antolínez Data complexity in supervised learning: A far-reaching implication , 2011 .

[16]  Nen-Fu Huang,et al.  Application traffic classification at the early stage by characterizing application rounds , 2013, Inf. Sci..

[17]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.