Spam filtering techniques and MapReduce with SVM: A study

Spam is the most dangerous threat to email systems today. Spam is any unwanted and harmful mail. Separation of spam from normal mails is essential. This paper surveys different spam filtering techniques, Support Vector Machine (SVM) training problems and need to introduce MapReduce Hadoop to train SVM. Techniques to separate spam mails are word based, content based, machine learning based and hybrid. Machine learning techniques are most popular because of high accuracy and mathematical support. SVM is the mostly used machine learning based technique in the spam filtering process because its ability to handle data with large attribute. Hurdles in training of SVM are, large time requirement and large dataset can't be given as an input. These both problems can be solved by implementing the training algorithm on MapReduce (Hadoop) framework which gives up to 6 times speedup than sequential algorithm.