Effective spam classification based on meta-heuristics

Using machine learning techniques such as naive Bayes, decision trees and support vector machines to automatically filter out spam e-mails has drawn many researchers' attention. Previous methods use keywords contained in e-mails to extract binary features from the corpus. However, since keywords of e-mails change from time to time, the performance of keyword-based solution is not stable. In this study, we use behaviors of spammers as the features for classifying e-mails. Such behaviors are first described by meta-heuristics and used as features of e-mails for classification. A total of 113 new features are extracted from the given meta-heuristics. Using existing machine learning techniques, the filtering performance is much better than that using keyword-based filtering. In addition, the training time is substantially reduced because of the low dimensional feature space and sparse feature vectors.