Catching the Phish: Detecting Phishing Attacks using Recurrent Neural Networks (RNNs)

The emergence of online services in our daily lives has been accompanied by a range of malicious attempts to trick individuals into performing undesired actions, often to the benefit of the adversary. The most popular medium of these attempts is phishing attacks, particularly through emails and websites. In order to defend against such attacks, there is an urgent need for automated mechanisms to identify this malevolent content before it reaches users. Machine learning techniques have gradually become the standard for such classification problems. However, identifying common measurable features of phishing content (e.g., in emails) is notoriously difficult. To address this problem, we engage in a novel study into a phishing content classifier based on a recurrent neural network (RNN), which identifies such features without human input. At this stage, we scope our research to emails, but our approach can be extended to apply to websites. Our results show that the proposed system outperforms state-of-the-art tools. Furthermore, our classifier is efficient and takes into account only the text and, in particular, the textual structure of the email. Since these features are rarely considered in email classification, we argue that our classifier can complement existing classifiers with high information gain.

[1]  K. P. Soman,et al.  Evaluating deep learning approaches to characterize and classify malicious URL's , 2018, J. Intell. Fuzzy Syst..

[2]  Fabio A. González,et al.  Classifying phishing URLs using recurrent neural networks , 2017, 2017 APWG Symposium on Electronic Crime Research (eCrime).

[3]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[4]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[5]  Qian Ma,et al.  Classifying Malicious URLs Using Gated Recurrent Neural Networks , 2018, IMIS.

[6]  Gerhard Paass,et al.  Improved Phishing Detection using Model-Based Features , 2008, CEAS.

[7]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[8]  K. P. Soman,et al.  Evaluating deep learning approaches to characterize and classify the DGAs at scale , 2018, J. Intell. Fuzzy Syst..

[9]  Zulfikar Ramzan Phishing Attacks and Countermeasures , 2010, Handbook of Information and Communication Security.

[10]  Jason R. C. Nurse,et al.  Baiting the hook: factors impacting susceptibility to phishing attacks , 2016, Human-centric Computing and Information Sciences.

[11]  Fergus Toolan,et al.  Feature selection for Spam and Phishing detection , 2010, 2010 eCrime Researchers Summit.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Jason R. C. Nurse Cybercrime and You: How Criminals Attack and the Human Factors That They Seek to Exploit , 2018, The Oxford Handbook of Cyberpsychology.

[16]  Marie-Francine Moens,et al.  New filtering approaches for phishing email , 2010, J. Comput. Secur..

[17]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[18]  Rakesh M. Verma,et al.  Detecting Phishing Emails the Natural Language Way , 2012, ESORICS.

[19]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[22]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[23]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[24]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.