Comparative Analysis of Classification Algorithms for Email Spam Detection

The increase in the use of email in every day transactions for a lot of businesses or general communication due to its cost effectiveness and efficiency has made emails vulnerable to attacks including spamming. Spam emails also called junk emails are unsolicited messages that are almost identical and sent to multiple recipients randomly. In this study, a performance analysis is done on some classification algorithms including: Bayesian Logistic Regression, Hidden Naïve Bayes, Radial Basis Function (RBF) Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Naïve Bayes, Multilayer Perceptron, Random Tree and J48. The performance of the algorithms were measured in terms of Accuracy, Precision, Recall, FMeasure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error using WEKA data mining tool. To have a balanced view on the classification algorithms’ performance, no feature selection or performance boosting method was employed. The research showed that a number of classification algorithms exist that if properly explored through feature selection means will yield more accurate results for email classification. Rotation Forest is found to be the classifier that gives the best accuracy of 94.2%. Though none of the algorithms did not achieve 100% accuracy in sorting spam emails, Rotation Forest has shown a near degree to achieving most accurate result.

[1]  Adwan Yasin Spam Reduction by using E-mail History and Authentication (SREHA) , 2016 .

[2]  Malik Muneeb Abid,et al.  Study on the Effectiveness of Spam Detection Technologies , 2016 .

[3]  K. Saraswathi,et al.  Content-Based Spam Filtering and Detection Algorithms- An Efficient Analysis & Comparison , 2013 .

[4]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[5]  D. Karthika Renuka,et al.  SPAM Classification Based on Supervised Learning Using Machine Learning Techniques , 2011 .

[6]  Megha Rathi,et al.  Spam Mail Detection through Data Mining – A Comparative Performance Analysis , 2013 .

[7]  Rafael Morales Bueno,et al.  A comparative study on feature selection and adaptive strategies for email foldering using the ABC-DynF framework , 2013, Knowl. Based Syst..

[8]  Mohammad Zavvar,et al.  Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine , 2016 .

[9]  Adamu I. Abubakar,et al.  A Review on Mobile SMS Spam Filtering Techniques , 2017, IEEE Access.

[10]  Manasi Patwardhan,et al.  EFFICIENT SPAM CLASSIFICATION BY APPROPRIATE FEATURE SELECTION , 2013 .

[11]  Aakanksha Sharaff,et al.  Comparative Study of Classification Algorithms for Spam Email Detection , 2016 .

[12]  Olawale Surajudeen Adebayo,et al.  The Design and Development of Real-Time E-Voting System in Nigeria with Emphasis on Security and Result Veracity , 2013 .

[13]  Bo Yu,et al.  A comparative study for content-based dynamic spam classification using four machine learning algorithms , 2008, Knowl. Based Syst..

[14]  Gurjot Kaur,et al.  E-Mail Spam Detection Using SVM and RBF , 2016 .

[15]  Shweta Bhardwaj,et al.  Spam Mail Detection Using Classification Techniques and Global Training Set , 2018 .

[16]  Aman Kumar Sharma,et al.  A Comparative Study of Classification Algorithms for Spam Email Data Analysis , 2011 .

[17]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[18]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[19]  S. M. Elseuofi,et al.  MACHINE LEARNING METHODS FOR SPAM E-MAIL CLASSIFICATION , 2011 .

[20]  R. Kishore Kumar,et al.  Comparative Study on Email Spam Classifier using Data Mining Techniques , 2012 .

[21]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[23]  Rasim M. Alguliyev,et al.  Classification of Textual E-Mail Spam Using Data Mining Techniques , 2011, Appl. Comput. Intell. Soft Comput..

[24]  P. K. Panigrahi,et al.  A Comparative Study of Supervised Machine Learning Techniques for Spam E-mail Filtering , 2012, 2012 Fourth International Conference on Computational Intelligence and Communication Networks.