An Expanded Feature Extraction of E-Mail Header for Spam Recognition

Currently a spam filtering method is extracting attributes from e-mail header and using machine learning methods to classify the sample sets. But as time goes on, spammers transform different ways to send spam, which result in a great change of spam's header. So the attributes defined in the past could not deal with this change sufficiently. This paper extracted attributes from all possible forged header fields to expand the feature sets, then used the rough set theory to classify the sample sets. Experiment validated more attributes including in feature sets may lead to greater performance, in terms of higher recall and precision, lower fake recognition than other algorithms.