A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion

Feature selection is often used as a pre-processing step for machine learning based network traffic classification. Many feature selection techniques have been developed to find an optimal subset of relevant features and to improve overall classification accuracy. But such techniques ignore the class imbalance problem encountered in network traffic classification. The selected feature subset may bias towards the traffic class that occupies the majority of traffic flows on the Internet. To address this issue, this paper proposes a new approach, called class-oriented feature selection (COFS), to identify a relevant feature subset for every class. It combines the proposed local metric and the existing global metric to yield a potentially optimal feature subset for each class, and then removes the redundant features in each feature subset based on the weighted symmetric uncertainty. Additionally, to enhance the generalization on network traffic data, an ensemble learning based scheme is presented with COFS to overcome the negative impacts of the data drift on a traffic classifier. Experiments on real-world network traffic data show that COFS outperforms existing feature selection techniques in most cases. Moreover, our approach achieves >96% flow accuracy and >93% byte accuracy on average.

[1]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection Classifier for UDP Traffic , 2010, IEEE/ACM Transactions on Networking.

[2]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[3]  Dario Rossi,et al.  Abacus: Accurate behavioral classification of P2P-TV traffic , 2011, Comput. Networks.

[4]  Maurizio Dusi,et al.  Using GMM and SVM-Based Techniques for the Classification of SSH-Encrypted Traffic , 2009, 2009 IEEE International Conference on Communications.

[5]  Zihui Ge,et al.  Network prefix-level traffic profiling: Characterizing, modeling, and evaluation , 2010, Comput. Networks.

[6]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[7]  Zhen Liu,et al.  A comparison of improving multi-class imbalance for internet traffic classification , 2014, Inf. Syst. Frontiers.

[8]  Xiaohong Guan,et al.  An SVM-based machine learning method for accurate internet traffic classification , 2010, Inf. Syst. Frontiers.

[9]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[10]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[11]  Yuan-Cheng Lai,et al.  Application classification using packet size distribution and port association , 2009, J. Netw. Comput. Appl..

[12]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[13]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[14]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[15]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[16]  Lei Zhang,et al.  Support Vector Guided Dictionary Learning , 2014, ECCV.

[17]  Guillaume Urvoy-Keller,et al.  Application-based feature selection for Internet traffic classification , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[18]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[19]  Zahir Tari,et al.  An optimal and stable feature selection approach for traffic classification based on multi-criterion fusion , 2014, Future Gener. Comput. Syst..

[20]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[21]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[22]  Kuai Xu,et al.  Internet Traffic Behavior Profiling for Network Security Monitoring , 2008, IEEE/ACM Transactions on Networking.

[23]  Jun Zhang,et al.  An Effective Network Traffic Classification Method with Unknown Flow Detection , 2013, IEEE Transactions on Network and Service Management.

[24]  Jing Liu,et al.  Learning on Class Imbalanced Data to Classify Peer-to-Peer Applications in IP Traffic using Resampling Techniques , 2009, 2009 International Joint Conference on Neural Networks.

[25]  Yang Xiao,et al.  A survey of communication/networking in Smart Grids , 2012, Future Gener. Comput. Syst..

[26]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.

[27]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[28]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[29]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[30]  Zhen Liu,et al.  Balanced feature selection method for Internet traffic classification , 2012, IET Networks.

[31]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[32]  Liang Lin,et al.  Representing and recognizing objects with massive local image patches , 2012, Pattern Recognit..

[33]  Zhisong Pan,et al.  Network traffic classification via non-convex multi-task feature learning , 2015, Neurocomputing.

[34]  Jing Ma,et al.  An Empirical Investigation of Filter Attribute Selection Techniques for High-Speed Network Traffic Flow Classification , 2012, Wirel. Pers. Commun..

[35]  José Francisco Martínez Trinidad,et al.  General framework for class-specific feature selection , 2011, Expert Syst. Appl..

[36]  Zhen Liu,et al.  Large traffic flows classification method , 2014, 2014 IEEE International Conference on Communications Workshops (ICC).

[37]  Yanghee Choi,et al.  NeTraMark: a network traffic classification benchmark , 2011, CCRV.

[38]  Ece Guran Schmidt,et al.  Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison , 2010, Perform. Evaluation.

[39]  Dario Rossi,et al.  Identifying Key Features for P2P Traffic Classification , 2011, 2011 IEEE International Conference on Communications (ICC).

[40]  Jesús E. Díaz-Verdejo,et al.  A multilevel taxonomy and requirements for an optimal traffic‐classification model , 2014, Int. J. Netw. Manag..

[41]  Antonio Pescapè,et al.  Traffic Classification through Joint Distributions of Packet-Level Statistics , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.