DTOF-ANN: An Artificial Neural Network phishing detection model based on Decision Tree and Optimal Features

Abstract Recently, phishing emerges as one of the biggest threats to human’s daily networking environments. Phishing attackers disguise illegal URLs as normal ones to steal user’s private information with the social engineering techniques, such as emails and SMS, which calls for an effective method of preventing phishing attacks to relieve the loss by them. Neural networks can be used to detect and prevent phishing attacks because of their strong active learning abilities from massive datasets and high accuracy in data classification. However, duplicate points in the public datasets and negative and useless features in the feature vectors will trap the training of the neural networks into the problem of over-fitting, which will make the trained classifier weak when detect phishing websites. This paper proposes DTOF-ANN (Decision Tree and Optimal Features based Artificial Neural Network) to tackle this shortcoming, which is a neural-network phishing detection model based on decision tree and optimal feature selection. First, the traditional K-medoids clustering algorithm is improved with an incremental selection of initial centers to remove the duplicate points from the public datasets. Then, an optimal feature selection algorithm based on the new defined feature evaluation index, decision tree and local search method is designed to prune out the negative and useless features. Finally, the optimal structure of the neural network classifier is constructed through properly adjusting parameters and trained by the selected optimal features. Experimental results have demonstrated that DTOF-ANN exhibits higher performance than many of the existing methods.

[1]  Baowen Xu,et al.  Web Phishing Detection Based on Page Spatial Layout Similarity , 2013, Informatica.

[2]  Mingxing He,et al.  An efficient phishing webpage detector , 2011, Expert Syst. Appl..

[3]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[4]  Wa’el Hadi,et al.  A new fast associative classification algorithm for detecting phishing websites , 2016, Appl. Soft Comput..

[5]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[6]  Erzhou Zhu,et al.  OFS-NN: An Effective Phishing Websites Detection Model Based on Optimal Feature Selection and Neural Network , 2019, IEEE Access.

[7]  Samuel Marchal,et al.  Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application , 2017, IEEE Transactions on Computers.

[8]  Choon Lin Tan,et al.  A survey of phishing attacks: Their types, vectors and technical approaches , 2018, Expert Syst. Appl..

[9]  Ali Selamat,et al.  Feature selection for phishing detection: a review of research , 2016, Int. J. Intell. Syst. Technol. Appl..

[10]  V. Prasanna Venkatesan,et al.  A Framework for Predicting Phishing Websites using Neural Networks , 2011, ArXiv.

[11]  Ahmed Hamza Osman,et al.  Enhancement of spam detection mechanism based on hybrid $$\varvec{k}$$k-mean clustering and support vector machine , 2015, Soft Comput..

[12]  Laxmi Ahuja,et al.  Detecting redirection spam using multilayer perceptron neural network , 2017, Soft Computing.

[13]  Fadi A. Thabtah,et al.  Phishing detection based Associative Classification data mining , 2014, Expert Syst. Appl..

[14]  Choon Lin Tan,et al.  A new hybrid ensemble feature selection framework for machine learning-based phishing detection system , 2019, Inf. Sci..

[15]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[16]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[17]  T. L. McCluskey,et al.  Intelligent rule-based phishing websites classification , 2014, IET Inf. Secur..

[18]  Sonia Chiasson,et al.  Why phishing still works: User strategies for combating phishing attacks , 2015, Int. J. Hum. Comput. Stud..

[19]  El-Sayed M. El-Alfy Detection of Phishing Websites Based on Probabilistic Neural Networks and K-Medoids Clustering , 2017, Computer/law journal.

[20]  Mouad Zouina,et al.  A novel lightweight URL phishing detection system using SVM and similarity index , 2017, Human-centric Computing and Information Sciences.