论文信息 - Extracting discriminative information from e-mail for spam detection inspired by Immune System

Extracting discriminative information from e-mail for spam detection inspired by Immune System

Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a sliding window. At each area, a two-dimensional feature is constructed by calculating the concentrations of spam and legitimate email. Then all the features of each area are combined together as a whole feature vector. Several experiments are conducted on four benchmark corpora, by using 10-fold cross-validation. It is shown that the LC approach can extract the effective position correlated information from messages. Compared to the prevalent Bag-of-Words approach, the LC has better performance in terms of both accuracy and F1 measure. Most significantly, the LC approach can reduce feature dimensionality greatly and has much faster speed.

Ying Tan | Yuanchun Zhu

[1] William S. Yerazunis. Sparse Binary Polynomial Hashing and the CRM114 Discriminator , 2006 .

[2] Enrico Blanzieri,et al. A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[3] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[4] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[5] Irena Koprinska,et al. A neural network based approach to automated e-mail classification , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[6] Karl-Michael Schneider,et al. A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[7] Ying Tan,et al. A three-layer back-propagation neural network for spam detection using artificial immune concentration , 2009, Soft Comput..

[8] William S. Yerazunis,et al. Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering , 2004, PKDD.

[9] Georgios Paliouras,et al. Learning to Filter Unsolicited Commercial E-Mail , 2006 .

[10] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[11] Georgios Paliouras,et al. Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[12] Chih-Hung Wu,et al. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks , 2009, Expert Syst. Appl..

[13] Lluís Màrquez i Villodre,et al. Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[14] Thiago S. Guzella,et al. Identification of SPAM messages using an approach inspired on the immune system , 2008, Biosyst..

[15] Ray Hunt,et al. Tightening the net: A review of current and next generation spam filtering tools , 2006, Comput. Secur..

[16] Tunga Güngör,et al. Time-efficient spam e-mail filtering using n-gram models , 2008, Pattern Recognit. Lett..

[17] Tony White,et al. Developing an Immunity to Spam , 2003, GECCO.

[18] Walmir M. Caminhas,et al. A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[19] D. Dasgupta,et al. Advances in artificial immune systems , 2006, IEEE Computational Intelligence Magazine.