Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification

Email has continued to be an integral part of our lives and as a means for successful communication on the internet. The problem of spam mails occupying a huge amount of space and bandwidth, and the weaknesses of spam filtering techniques which includes misclassification of genuine emails as spam (false positives) are a growing challenge to the internet world. This research work proposed the use of a metaheuristic optimization algorithm, the whale optimization algorithm (WOA), for the selection of salient features in the email corpus and rotation forest algorithm for classifying emails as spam and non-spam. The entire datasets were used, and the evaluation of the rotation forest algorithm was done before and after feature selection with WOA. The results obtained showed that the rotation forest algorithm after feature selection with WOA was able to classify the emails into spam and non-spam with a performance accuracy of 99.9% and a low FP rate of 0.0019. This shows that the proposed method had produced a remarkable improvement as compared with some previous methods.

[1]  Yanping Bai,et al.  A whale optimization algorithm with inertia weight , 2016 .

[2]  Hossam Faris,et al.  Optimizing connection weights in neural networks using the whale optimization algorithm , 2016, Soft Computing.

[3]  Rasim M. Alguliyev,et al.  Classification of Textual E-Mail Spam Using Data Mining Techniques , 2011, Appl. Comput. Intell. Soft Comput..

[4]  Gilbert Laporte,et al.  Metaheuristics: A bibliography , 1996, Ann. Oper. Res..

[5]  Ali Selamat,et al.  Hybrid email spam detection model with negative selection algorithm and differential evolution , 2014, Eng. Appl. Artif. Intell..

[6]  Albert L. Harris,et al.  Phishing Attacks Over Time: A Longitudinal Study , 2015, AMCIS.

[7]  Mark B. Neider,et al.  Who are Phishers luring?: A Demographic Analysis of Those Susceptible to Fake Emails , 2017 .

[8]  Riccardo Dondi,et al.  Gene tree correction for reconciliation and species tree inference , 2012, Algorithms for Molecular Biology.

[9]  R. Kishore Kumar,et al.  Comparative Study on Email Spam Classifier using Data Mining Techniques , 2012 .

[10]  Aakanksha Sharaff,et al.  Comparative Study of Classification Algorithms for Spam Email Detection , 2016 .

[11]  Shafii Muhammad Abdulhamid,et al.  Fault tolerance aware scheduling technique for cloud computing environment using dynamic clustering algorithm , 2016, Neural Computing and Applications.

[12]  Jeng-Shyang Pan,et al.  Breast Cancer Diagnosis Approach Based on Meta-Heuristic Optimization Algorithm Inspired by the Bubble-Net Hunting Strategy of Whales , 2016, ICGEC.

[13]  Oluwafemi Osho,et al.  Comparative Analysis of Classification Algorithms for Email Spam Detection , 2018 .

[14]  Trong-The Nguyen,et al.  A Multi-Objective Optimal Vehicle Fuel Consumption Based on Whale Optimization Algorithm , 2017 .

[15]  Abdelmunem Abuhasan,et al.  An intelligent classification model for phishing email detection , 2016, ArXiv.

[16]  S. Chettih,et al.  A hybrid whale algorithm and pattern search technique for optimal power flow problem , 2016, 2016 8th International Conference on Modelling, Identification and Control (ICMIC).

[17]  Ali Selamat,et al.  Improved email spam detection model with negative selection algorithm and particle swarm optimization , 2014, Appl. Soft Comput..

[18]  Jun Li,et al.  Flexible-Segmentation-Jumping Strategy to Reduce User-Perceived Latency for Video on Demand , 2011, Appl. Comput. Intell. Soft Comput..

[19]  P. Dinakara Prasad Reddy,et al.  Whale optimization algorithm for optimal sizing of renewable resources for loss reduction in distribution systems , 2017 .

[20]  N. R. Shetty,et al.  Emerging Research in Computing, Information, Communication and Applications , 2016 .

[21]  Mohammad Zavvar,et al.  Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine , 2016 .

[22]  Ali Kaveh,et al.  Applications of Metaheuristic Optimization Algorithms in Civil Engineering , 2016 .

[23]  Naghmeh Moradpoor,et al.  Employing machine learning techniques for detection and classification of phishing emails , 2017, 2017 Computing Conference.

[24]  Gurjot Kaur,et al.  E-Mail Spam Detection Using SVM and RBF , 2016 .

[25]  Megha Rathi,et al.  Spam Mail Detection through Data Mining – A Comparative Performance Analysis , 2013 .

[26]  Nagaraju Bogiri,et al.  Email Spam filtering using BPNN classification algorithm , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[27]  Rafael Morales Bueno,et al.  A comparative study on feature selection and adaptive strategies for email foldering using the ABC-DynF framework , 2013, Knowl. Based Syst..

[28]  Indrajit N. Trivedi,et al.  Novel Adaptive Whale Optimization Algorithm for Global Optimization , 2016 .

[29]  Andrew Lewis,et al.  The Whale Optimization Algorithm , 2016, Adv. Eng. Softw..

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .