Adversarial Sampling Attacks Against Phishing Detection

Phishing websites trick users into believing that they are interacting with a legitimate website, and thereby, capture sensitive information, such as user names, passwords, credit card numbers and other personal information. Machine learning appears to be a promising technique for distinguishing between phishing websites and legitimate ones. However, machine learning approaches are susceptible to adversarial learning techniques, which attempt to degrade the accuracy of a trained classifier model. In this work, we investigate the robustness of machine learning based phishing detection in the face of adversarial learning techniques. We propose a simple but effective approach to simulate attacks by generating adversarial samples through direct feature manipulation. We assume that the attacker has limited knowledge of the features, the learning models, and the datasets used for training. We conducted experiments on four publicly available datasets on the Internet. Our experiments reveal that the phishing detection mechanisms are vulnerable to adversarial learning techniques. Specifically, the identification rate for phishing websites dropped to 70% by manipulating a single feature. When four features were manipulated, the identification rate dropped to zero percent. This result means that, any phishing sample, which would have been detected correctly by a classifier model, can bypass the classifier by changing at most four feature values; a simple effort for an attacker for such a big reward. We define the concept of vulnerability level for each dataset that measures the number of features that can be manipulated and the cost for each manipulation. Such a metric will allow us to compare between multiple defense models.

[1]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[2]  Indrakshi Ray,et al.  "Kn0w Thy Doma1n Name": Unbiased Phishing Detection Using Domain Name Based Features , 2018, SACMAT.

[3]  Xu Chen,et al.  A stacking model using URL and HTML features for phishing webpage detection , 2019, Future Gener. Comput. Syst..

[4]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Minaxi Gupta,et al.  Countering Phishing from Brands' Vantage Point , 2016, IWSPA@CODASPY.

[6]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[7]  Choon Lin Tan Phishing Dataset for Machine Learning: Feature Evaluation , 2018 .

[8]  Fadi A. Thabtah,et al.  Phishing detection based Associative Classification data mining , 2014, Expert Syst. Appl..

[9]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[10]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[11]  Gang Wang,et al.  Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild , 2018, Internet Measurement Conference.

[12]  Jingyuan Zhang,et al.  A survey of cyber crimes , 2012, Secur. Commun. Networks.

[13]  Rakesh M. Verma,et al.  On the Character of Phishing URLs: Accurate and Robust Statistical Learning Classifiers , 2015, CODASPY.

[14]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[15]  Ehab Al-Shaer,et al.  PhishMon: A Machine Learning Framework for Detecting Phishing Webpages , 2018, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI).

[16]  Somesh Jha,et al.  Analyzing the Robustness of Nearest Neighbors to Adversarial Examples , 2017, ICML.

[17]  Martine De Cock,et al.  Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic , 2018, RAID.

[18]  Banu Diri,et al.  Machine learning based phishing detection from URLs , 2019, Expert Syst. Appl..

[19]  Chao Liu,et al.  A Deep Learning Based Online Malicious URL and DNS Detection Scheme , 2017, SecureComm.

[20]  Fabio Roli,et al.  Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection , 2017, IEEE Transactions on Dependable and Secure Computing.