论文信息 - Comparative Study of Classification Algorithms for Spam Email Detection

Comparative Study of Classification Algorithms for Spam Email Detection

Spam in emails has become a major issue. Spam messages consume space, network bandwidth and are of no use to the receiver. It is very difficult to filter spam as spammers try to tackle the processes carried out by the filtering mechanism. Various classification algorithms are used to classify a mail as spam or non-spam (ham). The present paper compares and discusses the effectiveness of four machine learning classification algorithms, belonging to different categories (Probabilistic, Decision Tree, Vector Machines and Lazy Algorithms) on the basis of various performance measures, using WEKA, a data mining tool to analyze different algorithms. Enron dataset is taken in a processed form from Athens University of Economics and Business and it is found that J48 and BayesNet algorithms perform better than SVM.

Aakanksha Sharaff | Naresh Kumar Nagwani | Abhishek Dhadse

[1] Stig Alvestad,et al. Early warnings of critical diagnoses , 2009 .

[2] Te-Ming Chang,et al. An incremental cluster-based approach to spam filtering , 2008, Expert Syst. Appl..

[3] El-Sayed M. El-Alfy,et al. Using GMDH-based networks for improved spam detection and email feature analysis , 2011, Appl. Soft Comput..

[4] P. K. Panigrahi,et al. A Comparative Study of Supervised Machine Learning Techniques for Spam E-mail Filtering , 2012, 2012 Fourth International Conference on Computational Intelligence and Communication Networks.

[5] Wei-Yin Loh,et al. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[6] Walmir M. Caminhas,et al. A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[7] R. Ackoff. From Data to Wisdom , 2014 .

[8] R. Geetha Ramani,et al. Parkinson Disease Classification using Data Mining Algorithms , 2011 .

[9] Manasi Patwardhan,et al. EFFICIENT SPAM CLASSIFICATION BY APPROPRIATE FEATURE SELECTION , 2013 .

[10] Ahmed Khorsi,et al. An Overview of Content-Based Spam Filtering Techniques , 2007, Informatica.

[11] S. M. Elseuofi,et al. Machine Learning methods for E-mail Classification , 2011 .

[12] Vangelis Metsis,et al. Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.