Classification model for imbalanced traffic data based on secondary feature extraction

The non-equilibrium of network traffic data brings about the non-equilibrium of classification. Feature extraction is an effective method to reduce data dimensions, while it can intensify the influence of non-equilibrium further. A secondary feature extraction algorithm of multidimensional assessment is proposed in this study. The features of network traffic are evaluated in different dimensions to provide the basis for feature extraction. Furthermore, a model dealing with imbalanced data is proposed based on secondary feature extraction and sampling. The model combines the benefits of dimension reduction and redistribution. The experiment results show that the proposed model can not only increase classification accuracy and decrease non-equilibrium, but also enhance the performance of different classification algorithms.

[1]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[2]  Carey L. Williamson,et al.  Internet Traffic Measurement , 2001, IEEE Internet Comput..

[3]  Ramin Sadre,et al.  A validation of the pseudo self-similar traffic model , 2002, Proceedings International Conference on Dependable Systems and Networks.

[4]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.

[5]  T.M. Padmaja,et al.  Majority filter-based minority prediction (MFMP): An approach for unbalanced datasets , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[6]  Chen Xu,et al.  A Classification Algorithm for Network Traffic based on Improved Support Vector Machine , 2013, J. Comput..

[7]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[8]  Fang Haitao,et al.  Identification of Wiener systems with nonlinearity being piecewise-linear function , 2008 .

[9]  Jeffrey C. Mogul Network Locality at the Scale of Processes , 1992, ACM Trans. Comput. Syst..

[10]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Sachi Desai,et al.  Imbalanced learning for pattern recognition: an empirical study , 2010, Security + Defence.

[12]  Shikha Agrawal,et al.  A Survey on Feature Selection Techniques for Internet Traffic Classification , 2015, 2015 International Conference on Computational Intelligence and Communication Networks (CICN).

[13]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[14]  Judith Kelner,et al.  Better network traffic identification through the independent combination of techniques , 2010, J. Netw. Comput. Appl..

[15]  Gao We The Divide-Conquer and Voting Strategy for Traffic Feature Selection , 2015 .

[16]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[17]  Yaojun Ding,et al.  A method of imbalanced traffic classification based on ensemble learning , 2015, 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[19]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[20]  Yan Chen,et al.  Embedded Feature Selection for Multi-label Classification of Music Emotions , 2012, Int. J. Comput. Intell. Syst..

[21]  Xiangjian He,et al.  Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm , 2016, IEEE Transactions on Computers.

[22]  Arul Menezes,et al.  A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora , 2001, DDMMT@ACL.

[23]  Yiqin Wang,et al.  Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine , 2013, Science China Information Sciences.