Robust Feature Selection for IM Applications at Early Stage Traffic Classification Using Machine Learning Algorithms

Identification of network traffic accurately at its early stage is very important for network traffic management and application traffic classification. In recent years, this becomes very hot topic to identify traffic at its early stage. Unidirectional and bidirectional statistical features are effective features and widely used in Internet traffic classification. However, it is important to evaluate and select effective features for Instant Messaging (IM) application traffic classification at early stage. In this paper we are interested to find out robust and effective features at early stage. We firstly extract 22 statistical features of the first flow on two different network environment traffic datasets include on HIT and NIMS datasets. Then mutual information is conducted between the extract statistical features to select the effective features. Additionally to select robust features, we execute attribute selection cfsSubsetEval with Best search evaluator that select the robust and stable features from the result achieved by mutual information. And then, we execute 10 well-known machine learning classifiers. Our experimental results show that max_fpktl, std_bpktl, max_biat, mean_fpktl, mean_bpktl and min_biat feature are robust features at early stage traffic classification.

[1]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[2]  Nabin Kumar Karn,et al.  Network Traffic Classification techniques and comparative analysis using Machine Learning algorithms , 2016, 2016 2nd IEEE International Conference on Computer and Communications (ICCC).

[3]  Asif Ali Laghari,et al.  WeChat Text Messages Service Flow Traffic Classification Using Machine Learning Technique , 2016, 2016 6th International Conference on IT Convergence and Security (ICITCS).

[4]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[5]  Patrick P. C. Lee,et al.  Fine-grained dissection of WeChat in cellular networks , 2015, 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS).

[6]  Guy Marchal,et al.  Multimodality image registration by maximization of mutual information , 1997, IEEE Transactions on Medical Imaging.

[7]  Rodrigo de Oliveira,et al.  What's up with whatsapp?: comparing mobile instant messaging behaviors with traditional SMS , 2013, MobileHCI '13.

[8]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[9]  Bo Yang,et al.  Effectiveness of Statistical Features for Early Stage Internet Traffic Identification , 2016, International Journal of Parallel Programming.

[10]  Dan Meng,et al.  On Accuracy of Early Traffic Classification , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[11]  Maarten van Someren,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004, Machine Learning.

[12]  P. van der Putten,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004 .

[13]  Luca Salgarelli,et al.  On the stability of the information carried by traffic flow features at the packet level , 2009, CCRV.

[14]  Pedro Casas,et al.  Vivisecting whatsapp through large-scale measurements in mobile networks , 2014, SIGCOMM.

[15]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[16]  Nabin Kumar Karn,et al.  WeChat Text and Picture Messages Service Flow Traffic Classification Using Machine Learning Technique , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[17]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[18]  Yao Liu,et al.  An Empirical Study of Video Messaging Services on Smartphones , 2014, NOSSDAV 2014.

[19]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[20]  Muhammad Shafiq,et al.  Effective Packet Number for 5G IM WeChat Application at Early Stage Traffic Classification , 2017, Mob. Inf. Syst..

[21]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Kenton O'Hara,et al.  Everyday dwelling with WhatsApp , 2014, CSCW.

[23]  Bo Yang,et al.  Effective packet number for early stage internet traffic identification , 2015, Neurocomputing.

[24]  Dawei Wang,et al.  Effective Feature Selection for 5G IM Applications Traffic Classification , 2017, Mob. Inf. Syst..