论文信息 - A Comparison of Support Vector Machines, Memory-based and Naïve Bayes Techniques on Spam Recognition

A Comparison of Support Vector Machines, Memory-based and Naïve Bayes Techniques on Spam Recognition

This paper presents a comparison of support vector machines (SVM), memory-based learning (MBL) and Naive Bayes (NB) techniques for the classification of legitimate and spam mails. Although there are a number of method-comparative studies regarding spam mail filtering, most of the studies are tested on separate data sets. In order to evaluate the effectiveness of SVM, MBL and NB methods, we have used a common publicly available corpus (LINGSPAM). As MBL and NB methods are previously tested with this corpus, the obtained best parameters are used in the experiments with few changes. On the other hand, intense experiments are made to find the best attribute dimensions with SVMs. Results show that SVM has significantly better performance for no-cost and high-cost cases, but NB performs best when the cost is extremely high.

Gülsen Eryigit | A. Cüneyd Tantug

[1] Lluís Màrquez i Villodre,et al. Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[2] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[3] Federico Girosi,et al. An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[4] Georgios Paliouras,et al. Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[5] Georgios Paliouras,et al. An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[6] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[7] Georgios Paliouras,et al. A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.

[8] Karl-Michael Schneider,et al. A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[9] KarkaletsisVangelis,et al. A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2003 .

[10] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[11] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.