Classification of malicious emails

An inherent part of everyday life and work on a computer is ownership and use of an email address. The main aim of this paper is to analyze existing approaches to classification of malicious emails. We have implemented a system, which is able to distinguish between legitimate and malicious emails. Subsequently, malicious emails are classified into three subcategories: spam, scam, and phishing. We prepared a labeled dataset. We extracted several features from emails contained in the dataset. Within the system, we have implemented four supervised machine learning methods (Random Forest, Decision Tree, Support Vector Machines, k-Nearest Neighbors) and evaluated them. According to our results, the Random Forest is the most suitable approach for email classification.

[1]  Kang-Leng Chiew,et al.  Phishing email detection technique by using hybrid features , 2015, 2015 9th International Conference on IT in Asia (CITA).

[2]  Abdulhamit Subasi,et al.  Comparison of Decision Tree Algorithms for Spam E-mail Filtering , 2018, 2018 1st International Conference on Computer Applications & Information Security (ICCAIS).

[3]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[4]  Ian Harris,et al.  Detecting Phishing Attacks Using Natural Language Processing and Machine Learning , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[5]  Amit Sharma,et al.  A Comparative Study Between Naive Bayes and Neural Network (MLP) Classifier for Spam Email Detection , 2014 .

[6]  Konrad Rieck,et al.  Reading Between the Lines: Content-Agnostic Detection of Spear-Phishing Emails , 2018, RAID.

[7]  Abdelmunem Abuhasan,et al.  An intelligent classification model for phishing email detection , 2016, ArXiv.

[8]  Li Zhang,et al.  Detection of phishing emails using data mining algorithms , 2015, 2015 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA).

[9]  W. Heeringa,et al.  Predicting intelligibility and perceived linguistic distance by means of the Levenshtein algorithm , 2008 .

[10]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[11]  Fergus Toolan,et al.  Feature selection for Spam and Phishing detection , 2010, 2010 eCrime Researchers Summit.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Turgay Çelik,et al.  Unsupervised feature learning for spam email filtering , 2019, Comput. Electr. Eng..

[14]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[15]  Qing Yang,et al.  A support vector machine based naive Bayes algorithm for spam filtering , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[16]  Mojtaba Vahidi-Asl,et al.  Learn to Detect Phishing Scams Using Learning and Ensemble ?Methods , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[17]  Aliaksandr Barushka,et al.  Spam Filtering Using Regularized Neural Networks with Rectified Linear Units , 2016, AI*IA.

[18]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.