E-mail Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Support Vector Machines (SVM) is a powerful classification technique in data mining and has been successfully applied to many real-world applications. Parameter selection of SVM will affect classification performance much during training process. However, parameter selection of SVM is usually identified by experience or grid search (GS). In this study, we use Taguchi method to make optimal approximation for the SVM-based E-mail Spam Filtering model. Six real-world mail data sets are selected to demonstrate the effectiveness and feasibility of the method. The results show that the Taguchi method can find the effective model with high classification accuracy.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[3]  Charles P. Staelin Parameter selection for support vector machines , 2002 .

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Georgios Paliouras,et al.  Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[6]  Michael G. Madden,et al.  The Genetic Kernel Support Vector Machine: Description and Evaluation , 2005, Artificial Intelligence Review.

[7]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[8]  Madhan Shridhar Phadke,et al.  Quality Engineering Using Robust Design , 1989 .

[9]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[10]  Bruce Archer Quality through design: experimental design, off-line quality control, and Taguchi's contributions: N Logothetis and H P Wynn, Clarendon Press, Oxford, 1989, 464 pp, £45.00 , 1991 .

[11]  Henry P. Wynn,et al.  Quality through design : experimental design, off-line quality control and Taguchi's contributions , 1991 .

[12]  Georgios Paliouras,et al.  Learning to Filter Unsolicited Commercial E-Mail , 2006 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..