论文信息 - An empirical study of three machine learning methods for spam filtering

An empirical study of three machine learning methods for spam filtering

The increasing volumes of unsolicited bulk e-mail (also known as spam) are bringing more annoyance for most Internet users. Using a classifier based on a specific machine-learning technique to automatically filter out spam e-mail has drawn many researchers' attention. This paper is a comparative study the performance of three commonly used machine learning methods in spam filtering. On the other hand, we try to integrate two spam filtering methods to obtain better performance. A set of systematic experiments has been conducted with these methods which are applied to different parts of an e-mail. Experiments show that using the header only can achieve satisfactory performance, and the idea of integrating disparate methods is a promising way to fight spam.

Chih-Chin Lai | Chih-Chin Lai

[1] Jeffrey O. Kephart,et al. SpamGuru: An Enterprise Anti-Spam Filtering System , 2004, CEAS.

[2] Joshua Alspector,et al. SVM-based Filtering of E-mail Spam with Content-specic Misclassication Costs , 2001 .

[3] Constantine D. Spyropoulos,et al. An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[4] William W. Cohen. Learning Rules that Classify E-Mail , 1996 .

[5] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[6] Lluís Màrquez i Villodre,et al. Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[7] Dave C. Trudgian. Spam Classification Using Nearest Neighbour Techniques , 2004, IDEAL.

[8] Yiming Yang,et al. An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[9] Muhammad E. Shaaban,et al. Identifying junk electronic mail in Microsoft outlook with a support vector machine , 2003, 2003 Symposium on Applications and the Internet, 2003. Proceedings..

[10] Georgios Paliouras,et al. Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[11] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[12] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13] Georgios Paliouras,et al. Stacking Classifiers for Anti-Spam Filtering of E-Mail , 2001, EMNLP.

[14] Georgios Paliouras,et al. An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.