Domain Classifier: Compromised Machines Versus Malicious Registrations

In “phishing attacks”, phishing websites disguised as trustworthy websites attempt to steal sensitive information. Remediation and mitigation options differ depending on whether the phishing website is hosted on a legitimate but compromised domain, in which case the domain owner is also a victim, or whether the domain itself is maliciously registered. We accordingly attempt to tackle here the important question of classifying known phishing sites as either compromised or maliciously registered. Following the recent adoption of GDPR standards now putting off-limits any personal data, few relevant literature criteria still satisfy those standards. We propose here a machine-learning based domain classifier, introducing nine novel features which exploit the internet presence and history of a domain, using only publicly available information. Evaluation of our domain classifier was performed with a corpus of phishing websites hosted on over 1,000 compromised domains and 10,000 malicious domains. In the randomized evaluation, our domain classifier achieved over 92% accuracy with under 8% false positive rate, with compromised cases as the positive class. We have also collected over 180,000 phishing website instances over the past 3 years. Using our classifier we show that 73% of the websites hosting attacks are compromised while the remaining 27% belong to the attackers.

[1]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[2]  Marco Balduzzi,et al.  Automatic Extraction of Indicators of Compromise for Web Applications , 2016, WWW.

[3]  Ilango Krishnamurthi,et al.  A comprehensive and efficacious architecture for detecting phishing webpages , 2014, Comput. Secur..

[4]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[5]  J. Hertz,et al.  Generalization in a linear perceptron in the presence of noise , 1992 .

[6]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[7]  Fabio Roli,et al.  DeltaPhish: Detecting Phishing Webpages in Compromised Websites , 2017, ESORICS.

[8]  Tyler Moore,et al.  Evil Searching: Compromise and Recompromise of Internet Hosts for Phishing , 2009, Financial Cryptography.

[9]  Zhou Li,et al.  Don't Let One Rotten Apple Spoil the Whole Barrel: Towards Automated Detection of Shadowed Domains , 2017, CCS.

[10]  Nick Feamster,et al.  PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration , 2016, CCS.

[11]  Mike Thelwall,et al.  A fair history of the Web? Examining country balance in the Internet Archive , 2004 .

[12]  Qian Cui,et al.  Tracking Phishing Attacks Over Time , 2017, WWW.

[13]  Tyler Moore,et al.  The Impact of Incentives on Notice and Take-down , 2008, WEIS.

[14]  Ankit Kumar Jain,et al.  Phishing Detection: Analysis of Visual Similarity Based Approaches , 2017, Secur. Commun. Networks.