Machine Learning methods for E-mail Classification

The increasing volume of unsolicited bulk e-mail (also known as spam) has generated a need for reliable antispam filters. Using a classifier based on machine learning techniques to automatically filter out spam email has drawn many researchers attention. In this paper we review some of the most popular machine learning methods (Bayesian classification, k-NN, ANNs, SVMs, Artificial immune system and Rough sets) and of their applicability to the problem of spam Email classification. Descriptions of the algorithms are presented, and the comparison of their performance on the SpamAssassin spam corpus is presented.

[1]  Fayez Gebali,et al.  Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification , 2009, Comput. Networks.

[2]  Patrick Pantel,et al.  SpamCop: A Spam Classification & Organisation Program , 1998, AAAI 1998.

[3]  Otávio Augusto S. Carpinteiro,et al.  A Neural Model in Anti-spam Systems , 2006, ICANN.

[4]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Yuchun Tang,et al.  Support Vector Machines and Random Forests Modeling for Spam Senders Behavior Analysis , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[6]  Ahmed Khorsi,et al.  An Overview of Content-Based Spam Filtering Techniques , 2007, Informatica.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[9]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Chih-Hung Wu,et al.  Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks , 2009, Expert Syst. Appl..

[12]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[13]  Andrzej Skowron,et al.  Boolean Reasoning Scheme with Some Applications in Data Mining , 1999, PKDD.

[14]  Kang Li,et al.  Fast statistical spam filter by approximate classifications , 2006, SIGMETRICS '06/Performance '06.