Spark-Based Feature Selection Algorithm of Network Traffic Classification

Currently, with the rapid increasing of data scale in network traffic classification, how to select traffic features efficiently is becoming a big challenge. In this paper, we propose a redundant window-based optimal feature subset discover algorithm for feature selection, which use the growth algorithm to discover the relevant features and use the shrink algorithm to eliminate the redundant ones. Window redundancy and a parallel computing framework called Spark is integrated into the algorithm, which improve the efficiency of the algorithm significantly. The experimental results show that our method has a good performance in terms of accuracy and scalability, and improves the execution efficiency of feature selection and traffic classification.

[1]  Wei Wu,et al.  Efficient and robust feature extraction and selection for traffic classification , 2017, Comput. Networks.

[2]  Chaozheng Wang,et al.  An improved network traffic classification algorithm based on Hadoop decision tree , 2016, 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS).

[3]  Jiong Jin,et al.  Novel feature selection and classification of Internet video traffic based on a hierarchical scheme , 2017, Comput. Networks.

[4]  Gang Lu,et al.  Cascaded classifier for improving traffic classification accuracy , 2017, IET Commun..

[5]  Fatih Ertam,et al.  A new approach for internet traffic classification: GA-WK-ELM , 2017 .

[6]  Eero Vainikko,et al.  Adapting scientific computing problems to clouds using MapReduce , 2012, Future Gener. Comput. Syst..

[7]  Chun-Ying Huang,et al.  High performance traffic classification based on message size sequence and distribution , 2016, J. Netw. Comput. Appl..

[8]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[9]  Jie Zhou,et al.  Hadoop-Based Dynamic Load Balance Scheduling Algorithm of Logistics Inventory , 2016, 2016 12th International Conference on Computational Intelligence and Security (CIS).

[10]  Christof Fetzer,et al.  Scalable Network Traffic Classification Using Distributed Support Vector Machines , 2015, 2015 IEEE 8th International Conference on Cloud Computing.