Effective Packet Number for 5G IM WeChat Application at Early Stage Traffic Classification

Accurate network traffic classification at early stage is very important for 5G network applications. During the last few years, researchers endeavored hard to propose effective machine learning model for classification of Internet traffic applications at early stage with few packets. Nevertheless, this essential problem still needs to be studied profoundly to find out effective packet number as well as effective machine learning (ML) model. In this paper, we tried to solve the above-mentioned problem. For this purpose, five Internet traffic datasets are utilized. Initially, we extract packet size of 20 packets and then mutual information analysis is carried out to find out the mutual information of each packet on flow type. Thereafter, we execute 10 well-known machine learning algorithms using crossover classification method. Two statistical analysis tests, Friedman and Wilcoxon pairwise tests, are applied for the experimental results. Moreover, we also apply the statistical tests for classifiers to find out effective ML classifier. Our experimental results show that 13–19 packets are the effective packet numbers for 5G IM WeChat application at early stage network traffic classification. We also find out effective ML classifier, where Random Forest ML classifier is effective classifier at early stage Internet traffic classification.

[1]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[3]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[4]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[5]  Andrea Baiocchi,et al.  Low complexity, high performance neuro-fuzzy system for Internet traffic flows early classification , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[6]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[7]  Dan Meng,et al.  On Accuracy of Early Traffic Classification , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[8]  Rodrigo de Oliveira,et al.  What's up with whatsapp?: comparing mobile instant messaging behaviors with traditional SMS , 2013, MobileHCI '13.

[9]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[10]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[11]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[12]  Yao Liu,et al.  An Empirical Study of Video Messaging Services on Smartphones , 2014, NOSSDAV 2014.

[13]  Antonio Pescapè,et al.  Early Classification of Network Traffic through Multi-classification , 2011, TMA.

[14]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[15]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[16]  Asif Ali Laghari,et al.  WeChat Text Messages Service Flow Traffic Classification Using Machine Learning Technique , 2016, 2016 6th International Conference on IT Convergence and Security (ICITCS).

[17]  D. Quade Using Weighted Rankings in the Analysis of Complete Blocks with Additive Block Effects , 1979 .

[18]  Nen-Fu Huang,et al.  Application traffic classification at the early stage by characterizing application rounds , 2013, Inf. Sci..

[19]  Francesco Palmieri,et al.  A nonlinear, recurrence-based approach to traffic classification , 2009, Comput. Networks.

[20]  Jinsoo Hwang,et al.  High-Performance Internet Traffic Classification Using a Markov Model and Kullback-Leibler Divergence , 2016, Mob. Inf. Syst..

[21]  Patrick P. C. Lee,et al.  Fine-grained dissection of WeChat in cellular networks , 2015, 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS).

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[24]  Guy Marchal,et al.  Multimodality image registration by maximization of mutual information , 1997, IEEE Transactions on Medical Imaging.

[25]  Bo Yang,et al.  Effectiveness of Statistical Features for Early Stage Internet Traffic Identification , 2016, International Journal of Parallel Programming.

[26]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[27]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[28]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[29]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[30]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[31]  Luca Salgarelli,et al.  On the stability of the information carried by traffic flow features at the packet level , 2009, CCRV.

[32]  Pedro Casas,et al.  Vivisecting whatsapp through large-scale measurements in mobile networks , 2014, SIGCOMM.

[33]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[34]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[35]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[36]  Alfredo De Santis,et al.  Network anomaly detection with the restricted Boltzmann machine , 2013, Neurocomputing.

[37]  Kenton O'Hara,et al.  Everyday dwelling with WhatsApp , 2014, CSCW.

[38]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Bo Yang,et al.  Effective packet number for early stage internet traffic identification , 2015, Neurocomputing.

[40]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[41]  Nen-Fu Huang,et al.  Early Identifying Application Traffic with Application Characteristics , 2008, 2008 IEEE International Conference on Communications.

[42]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[43]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.