论文信息 - Effective spam classification based on meta-heuristics

Effective spam classification based on meta-heuristics

Using machine learning techniques such as naive Bayes, decision trees and support vector machines to automatically filter out spam e-mails has drawn many researchers' attention. Previous methods use keywords contained in e-mails to extract binary features from the corpus. However, since keywords of e-mails change from time to time, the performance of keyword-based solution is not stable. In this study, we use behaviors of spammers as the features for classifying e-mails. Such behaviors are first described by meta-heuristics and used as features of e-mails for classification. A total of 113 new features are extracted from the given meta-heuristics. Using existing machine learning techniques, the filtering performance is much better than that using keyword-based filtering. In addition, the training time is substantially reduced because of the low dimensional feature space and sparse feature vectors.

Chi-Yuan Yeh | Chili-Hung Wu | Shine-Hwang Doong

[1] Sanjay P. Ahuja,et al. Anti-Spam Filtering Using Neural Networks , 2004, IC-AI.

[2] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[3] Gerard Salton,et al. On the Specification of Term Values in Automatic Indexing , 1973 .

[4] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[5] Lluís Màrquez i Villodre,et al. Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[6] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[7] Hongyuan Zha,et al. Exploring Support Vector Machines and Random Forests for Spam Detection , 2004, CEAS.

[8] Constantine D. Spyropoulos,et al. An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[9] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[10] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[11] C. H. Wu,et al. Detection of Spam E-Mails by Analyzing the Distributing Behaviors of E-Mail Servers , 2003, HIS.

[12] Georgios Paliouras,et al. A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.

[13] Mads Haahr,et al. A Case-Based Approach to Spam Filtering that Can Track Concept Drift , 2003 .