Phishing Email Detection Using Improved RCNN Model With Multilevel Vectors and Attention Mechanism

The phishing email is one of the significant threats in the world today and has caused tremendous financial losses. Although the methods of confrontation are continually being updated, the results of those methods are not very satisfactory at present. Moreover, phishing emails are growing at an alarming rate in recent years. Therefore, more effective phishing detection technology is needed to curb the threat of phishing emails. In this paper, we first analyzed the email structure. Then, based on an improved recurrent convolutional neural networks (RCNN) model with multilevel vectors and attention mechanism, we proposed a new phishing email detection model named THEMIS, which is used to model emails at the email header, the email body, the character level, and the word level simultaneously. To evaluate the effectiveness of THEMIS, we use an unbalanced dataset that has realistic ratios of phishing and legitimate emails. The experimental results show that the overall accuracy of THEMIS reaches 99.848%. Meanwhile, the false positive rate (FPR) is 0.043%. High accuracy and low FPR ensure that the filter can identify phishing emails with high probability and filter out legitimate emails as little as possible. This promising result is superior to the existing detection methods and verifies the effectiveness of THEMIS in detecting phishing emails.

[1]  Julia M. Taylor,et al.  Using Syntactic Features for Phishing Detection , 2015, ArXiv.

[2]  Jiahua Zhang,et al.  Phishing Detection Method Based on Borderline-Smote Deep Belief Network , 2017, SpaCCS Workshops.

[3]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Rakesh M. Verma,et al.  Detecting Phishing Emails the Natural Language Way , 2012, ESORICS.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  N. A. Unnithan,et al.  Deep Learning Based Phishing E-mail Detection , 2018 .

[8]  Gerhard Paass,et al.  Improved Phishing Detection using Model-Based Features , 2008, CEAS.

[9]  Jasveer Singh Detection of Phishing e-mail , 2011 .

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Kang-Leng Chiew,et al.  Phishing email detection technique by using hybrid features , 2015, 2015 9th International Conference on IT in Asia (CITA).

[12]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[13]  Luis F. T. Moraes,et al.  Anti-Phishing Pilot at ACM IWSPA 2018 Evaluating Performance with New Metrics for Unbalanced Datasets , 2018 .

[14]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[15]  P. Lalitha,et al.  New Filtering Approaches for Phishing Email , 2013 .

[16]  Jemal H. Abawajy,et al.  Hybrid Feature Selection for Phishing Email Detection , 2011, ICA3PP.

[17]  Lorrie Faith Cranor,et al.  An Empirical Analysis of Phishing Blacklists , 2009, CEAS 2009.

[18]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[19]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[20]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[21]  Fabio A. González,et al.  Classifying phishing URLs using recurrent neural networks , 2017, 2017 APWG Symposium on Electronic Crime Research (eCrime).

[22]  Nauman Aslam,et al.  Detection of online phishing email using dynamic evolving neural network based on reinforcement learning , 2018, Decis. Support Syst..

[23]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[24]  Jürgen Schmidhuber,et al.  Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[25]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[26]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[27]  Rakesh M. Verma,et al.  Semantic Feature Selection for Text with Application to Phishing Email Detection , 2013, ICISC.

[28]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[29]  Ralf Krestel,et al.  Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks , 2018, ECIR.

[30]  Christopher N. Gutierrez,et al.  Learning from the Ones that Got Away: Detecting New Forms of Phishing Attacks , 2018, IEEE Transactions on Dependable and Secure Computing.

[31]  K. P. Soman,et al.  PED-ML: Phishing email detection using classical machine learning techniques CENSec@Amrita , 2018 .

[32]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[33]  Thien Huu Nguyen,et al.  A Deep Learning Model with Hierarchical LSTMs and Supervised Attention for Anti-Phishing , 2018, ArXiv.

[34]  Haixun Wang,et al.  Online Anomaly Prediction for Robust Cluster Systems , 2009, 2009 IEEE 25th International Conference on Data Engineering.