SVM Classifier Incorporating Feature Selection Using GA for Spam Detection

The use of SVM (Support Vector Machines) in detecting e-mail as spam or nonspam by incorporating feature selection using GA (Genetic Algorithm) is investigated. An GA approach is adopted to select features that are most favorable to SVM classifier, which is named as GA-SVM. Scaling factor is exploited to measure the relevant coefficients of feature to the classification task and is estimated by GA. Heavy-bias operator is introduced in GA to promote sparse in the scaling factors of features. So, feature selection is performed by eliminating irrelevant features whose scaling factor is zero. The experiment results on UCI Spam database show that comparing with original SVM classifier, the number of support vector decreases while better classification results are achieved based on GA-SVM.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Lawrence Carin,et al.  A Bayesian approach to joint feature selection and classifier design , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[4]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[5]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[6]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[7]  Joachim M. Buhmann,et al.  Feature selection for support vector machines , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Lalit M. Patnaik,et al.  Genetic algorithms: a survey , 1994, Computer.

[12]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..