论文信息 - Email Spam Classification by Support Vector Machine

Email Spam Classification by Support Vector Machine

Traditionally spam filtering techniques such as Black and White List were employed but with todays state of the Internet these methods are becoming Obsolete. With increasing popularity of the internet it is difficult to prepare a spam filter to effectively separate the spam mails from useful mails automatically before even they enter the inbox and thus crowding up the space in the inbox. Many computer scientists have been working on the methods to develop a machine learning based algorithm using statistical learning methods to tackle this problem. What is considered as a major concern right now is to make a spam filter that can efficiently capture all the spam messages and all the variety they come in and at the same time perform at a high rate. Within the context of Machine learning SVM can play a major role in spam detections and filtering however SVM faces one problem which is the choice of the kernel for the SVM that direly affects its performance. In this paper, we evaluate the performance of Non Linear SVM based classifiers with two different kernel functions i.e. Linear Kernel and Gaussian Kernel over SpamAssasin Public Corpus Dataset. Furthermore we compare the Training and Testing accuracy of these 2 kernels on the above mentioned dataset and attempt to explain which Kernel Behaves better with which dataset. Then we take some Emails extracted from Gmails Inbox and spam container and test our classifier on them.

[1] Walmir M. Caminhas,et al. A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[2] Yagang Zhang,et al. Wind energy prediction with LS-SVM based on Lorenz perturbation , 2009 .

[3] Nizar Bouguila,et al. A study of spam filtering using support vector machines , 2010, Artificial Intelligence Review.

[4] Fabio Roli,et al. Spam Filtering Based On The Analysis Of Text Information Embedded Into Images , 2006, J. Mach. Learn. Res..

[5] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[6] Hamid A. Jalab,et al. Overview of textual anti-spam filtering techniques , 2010 .

[7] Ola Amayri,et al. On email spam filtering using support vector machine , 2009 .

[8] Enrico Blanzieri,et al. A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.