Balanced feature selection method for Internet traffic classification

In Internet traffic classification, the class imbalance problem is mainly addressed by adjusting the class distribution. In the meanwhile, feature selection is also a key factor evoking this problem. Therefore a new filter feature selection method called balanced feature selection (BFS) is proposed. Every feature is measured both locally and globally and then an optimal feature subset is selected by our search model. A certainty coefficient is presented to measure the correlation between a feature and a certain class locally. The symmetric uncertainty is utilised to measure a feature and all classes globally. Through experiments on two real traffic traces using three classification algorithms, BFS is compared with five existing feature selection methods. Results show that it outperforms others by more than 15.29% g-mean improvement. Classification results are averaged over all datasets and classifiers here, 59.54% g-mean, 86.35% Mauc and 91.42% overall accuracy are achieved, respectively, when it is used.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[5]  Kuai Xu,et al.  Internet Traffic Behavior Profiling for Network Security Monitoring , 2008, IEEE/ACM Transactions on Networking.

[6]  Dario Rossi,et al.  Abacus: Accurate behavioral classification of P2P-TV traffic , 2011, Comput. Networks.

[7]  Wei Wang,et al.  Towards Fast Detecting Intrusions: Using Key Attributes of Network Traffic , 2008, 2008 The Third International Conference on Internet Monitoring and Protection.

[8]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[9]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[10]  Myung-Sup Kim,et al.  Real-time Classification of Internet Application Traffic using a Hierarchical Multi-class SVM , 2010, KSII Trans. Internet Inf. Syst..

[11]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[12]  Andrew W. Moore,et al.  Traffic Classification Using a Statistical Approach , 2005, PAM.

[13]  Xiaohong Guan,et al.  An SVM-based machine learning method for accurate internet traffic classification , 2010, Inf. Syst. Frontiers.