Effective Feature Selection for 5G IM Applications Traffic Classification

Recently, machine learning (ML) algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI) is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC) metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

[1]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[2]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[3]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[4]  Abhijit S. Pandya,et al.  Feature selection with biased sample distributions , 2009, 2009 IEEE International Conference on Information Reuse & Integration.

[5]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[6]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[7]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[8]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[9]  P. van der Putten,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004 .

[10]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[11]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[12]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[13]  Yao Liu,et al.  An Empirical Study of Video Messaging Services on Smartphones , 2014, NOSSDAV 2014.

[14]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[15]  Asif Ali Laghari,et al.  WeChat Text Messages Service Flow Traffic Classification Using Machine Learning Technique , 2016, 2016 6th International Conference on IT Convergence and Security (ICITCS).

[16]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[17]  Muhammad Shafiq,et al.  Effective Packet Number for 5G IM WeChat Application at Early Stage Traffic Classification , 2017, Mob. Inf. Syst..

[18]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[19]  Mark Coates,et al.  Controlling False Alarm/Discovery Rates in Online Internet Traffic Flow Classification , 2009, IEEE INFOCOM 2009.

[20]  Bo Yang,et al.  Effectiveness of Statistical Features for Early Stage Internet Traffic Identification , 2016, International Journal of Parallel Programming.

[21]  Paweł Foremski On different ways to classify Internet traffic : a short review of selected publications , 2013 .

[22]  Nazim Agoulmine,et al.  Predictive connectionist approach for VoD bandwidth management , 2007, Comput. Commun..

[23]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Rodrigo de Oliveira,et al.  What's up with whatsapp?: comparing mobile instant messaging behaviors with traditional SMS , 2013, MobileHCI '13.

[25]  D. Quade Using Weighted Rankings in the Analysis of Complete Blocks with Additive Block Effects , 1979 .

[26]  Pedro Casas,et al.  Vivisecting whatsapp through large-scale measurements in mobile networks , 2014, SIGCOMM.

[27]  Bo Yang,et al.  Traffic identification using flexible neural trees , 2010, 2010 IEEE 18th International Workshop on Quality of Service (IWQoS).

[28]  Patrick P. C. Lee,et al.  Fine-grained dissection of WeChat in cellular networks , 2015, 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS).

[29]  Guy Marchal,et al.  Multimodality image registration by maximization of mutual information , 1997, IEEE Transactions on Medical Imaging.

[30]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[31]  Maarten van Someren,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004, Machine Learning.

[32]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[33]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[34]  Kenton O'Hara,et al.  Everyday dwelling with WhatsApp , 2014, CSCW.

[35]  Bo Yang,et al.  Effective packet number for early stage internet traffic identification , 2015, Neurocomputing.

[36]  Gang Lu,et al.  TCFOM: A Robust Traffic Classification Framework Based on OC-SVM Combined with MC-SVM , 2010, 2010 International Conference on Communications and Intelligence Information Security.

[37]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[38]  Nabin Kumar Karn,et al.  WeChat Text and Picture Messages Service Flow Traffic Classification Using Machine Learning Technique , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[39]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.