OCFSII: A New Feature Selection Based on Orthogonal Centroid both Inter-class and Intra-class for Vulnerability Classification

With the rapid development of information technology, vulnerability has become a major threat to network security management. Vulnerability classification plays a vital role in the whole process of vulnerability management. It is the key point to select proper features to represent categories. Due to the low efficiency and accuracy of some common feature selection algorithms, in this paper, we proposed a new method called OCFSII, which measures the importance of the feature terms both in inter-class and intra-class based on orthogonal centroid. We evaluated the method on the vulnerability database, using two classifiers, namely, KNN and SVM. The experimental results show that the proposed method OCFSII outperforms Information Gain (IG), Document Frequency (DF), Orthogonal Centroid (OC), and is comparable with Improved Gini index (IGI) when KNN used while OCFSII is superior to the four algorithms. In addition, OCFSII is more advanced than OC.

[1]  Han-Joon Kim,et al.  Semantic text classification with tensor space model-based naïve Bayes , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[2]  Wei-Ying Ma,et al.  OCFS: optimal orthogonal centroid feature selection for text categorization , 2005, SIGIR '05.

[3]  Ju An Wang,et al.  Vulnerability categorization using Bayesian networks , 2010, CSIIRW '10.

[4]  Fredric H. Schmitz,et al.  Reduction of Blade-Vortex Interaction (BVI) noise through X-force control , 1998 .

[5]  K. M. Azharul Hasan,et al.  Opinion mining using Naïve Bayes , 2015, 2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE).

[6]  Pavel Minarík,et al.  NetFlow Data Visualization Based on Graphs , 2008, VizSEC.

[7]  Brigitte Bigi,et al.  Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[8]  Robert A. Martin,et al.  Vulnerability Type Distributions in CVE , 2007 .

[9]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[10]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[11]  Lin Yuan,et al.  Evaluation of security vulnerability severity based on CMAHP , 2016, 2016 2nd IEEE International Conference on Computer and Communications (ICCC).

[12]  Anand Kumar Gupta,et al.  Naïve Bayes Approach for Predicting Missing Links in Ego Networks , 2016, 2016 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS).

[13]  Yuan Zhang,et al.  A Categorization Framework for Common Computer Vulnerabilities and Exposures , 2010, Comput. J..

[14]  W. Marsden I and J , 2012 .