Performance Evaluation of Machine Learning Algorithms for Email Spam Detection

Sending huge number of unwanted mails causes security threat to users. In spite of various security approaches, spammers cause much vulnerability in the internet. This paper discusses the efficient methods of using some of the popular algorithms for building a machine learning model which can classify whether a mail is a spam or ham. UCI Machine Learning Repository Spambase Data Set is used for the experiment. The performance of five important machine learning classification algorithms viz. Logistic Regression, Decision Tree, Naive Bayes, KNN and SVM are evaluated in order to train and build an effective machine learning model for email spam detection. Weka tool is used for training and testing the data set.

[1]  Ethem Alpaydin Introduction to machine learning, 2rd ed , 2014 .

[2]  Jefferson Provost,et al.  Na ive-Bayes vs. Rule-Learning in Classification of Email , 1999 .

[3]  Sumant Sharma,et al.  Adaptive Approach for Spam Detection , 2013 .

[4]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[5]  Ali Selamat,et al.  Hybrid email spam detection model with negative selection algorithm and differential evolution , 2014, Eng. Appl. Artif. Intell..

[6]  Jeremy J. Eberhardt Bayesian Spam Detection , 2015 .

[7]  I. Cloete,et al.  Learning to classify email: a survey , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[8]  Klaus-Dieter Thoben,et al.  Machine learning in manufacturing: advantages, challenges, and applications , 2016 .

[9]  Rajendra Pamula,et al.  Email Spam Classification by Support Vector Machine , 2018, 2018 International Conference on Computing, Power and Communication Technologies (GUCON).

[10]  M. Bassiouni,et al.  Ham and Spam E-Mails Classification Using Machine Learning Techniques , 2018 .

[11]  Sanyam Shukla,et al.  Spam Filtering using Support Vector Machine , 2010 .

[12]  Gerhard Paass,et al.  Improved Phishing Detection using Model-Based Features , 2008, CEAS.

[13]  Minoru Sasaki,et al.  Spam detection using text clustering , 2005, 2005 International Conference on Cyberworlds (CW'05).

[14]  Toran Verma,et al.  E-Mail Spam Detection and Classfication Using SVM and Feature Extraction , 2017 .

[15]  Harry Wechsler,et al.  Spam Detection using Clustering, Random Forests, and Active Learning , 2009 .

[16]  Haruna Chiroma,et al.  Machine learning for email spam filtering: review, approaches and open research problems , 2019, Heliyon.

[17]  Chunheng Wang,et al.  Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[18]  S. M. Elseuofi,et al.  MACHINE LEARNING METHODS FOR SPAM E-MAIL CLASSIFICATION , 2011 .

[19]  Archana K Rajan,et al.  An Improved Spam Detection Method with Weighted Support Vector Machine , 2018, 2018 International Conference on Data Science and Engineering (ICDSE).