Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild

Today's phishing websites are constantly evolving to deceive users and evade the detection. In this paper, we perform a measurement study on squatting phishing domains where the websites impersonate trusted entities not only at the page content level but also at the web domain level. To search for squatting phishing pages, we scanned five types of squatting domains over 224 million DNS records and identified 657K domains that are likely impersonating 702 popular brands. Then we build a novel machine learning classifier to detect phishing pages from both the web and mobile pages under the squatting domains. A key novelty is that our classifier is built on a careful measurement of evasive behaviors of phishing pages in practice. We introduce new features from visual analysis and optical character recognition (OCR) to overcome the heavy content obfuscation from attackers. In total, we discovered and verified 1,175 squatting phishing pages. We show that these phishing pages are used for various targeted scams, and are highly effective to evade detection. More than 90% of them successfully evaded popular blacklists for at least a month.

[1]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[2]  Stephen Groat,et al.  GoldPhish: Using Images for Content-Based Phishing Analysis , 2010, 2010 Fifth International Conference on Internet Monitoring and Protection.

[3]  Eric Medvet,et al.  Visual-similarity-based phishing detection , 2008, SecureComm.

[4]  Adam Doupé,et al.  Inside a phisher's mind: Understanding the anti-phishing ecosystem through phishing kit analysis , 2018, 2018 APWG Symposium on Electronic Crime Research (eCrime).

[5]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[6]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[7]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[8]  Christopher Krügel,et al.  Meerkat: Detecting Website Defacements through Image-based Object Recognition , 2015, USENIX Security Symposium.

[9]  Tyler Moore,et al.  Measuring the Perpetrators and Funders of Typosquatting , 2010, Financial Cryptography.

[10]  Qian Cui,et al.  Tracking Phishing Attacks Over Time , 2017, WWW.

[11]  Zhou Li,et al.  FrameHanger: Evaluating and Classifying Iframe Injection at Large Scale , 2018, SecureComm.

[12]  Rachel Greenstadt,et al.  PhishZoo: Detecting Phishing Websites by Looking at Them , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[13]  Wouter Joosen,et al.  Bitsquatting: exploiting bit-flips for fun, or profit? , 2013, WWW '13.

[14]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[15]  Hao Chen,et al.  iPhish: Phishing Vulnerabilities on Consumer Electronics , 2008, UPSEC.

[16]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[17]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[18]  Chris Kanich,et al.  The Long "Taile" of Typosquatting Domain Names , 2014, USENIX Security Symposium.

[19]  Wei Xu,et al.  JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.

[20]  Patrick D. McDaniel,et al.  Adversarial Examples for Malware Detection , 2017, ESORICS.

[21]  Fabio Roli,et al.  DeltaPhish: Detecting Phishing Webpages in Compromised Websites , 2017, ESORICS.

[22]  Wouter Joosen,et al.  Seven Months' Worth of Mistakes: A Longitudinal Study of Typosquatting Abuse , 2015, NDSS.

[23]  Arvind Narayanan,et al.  Online Tracking: A 1-million-site Measurement and Analysis , 2016, CCS.

[24]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[25]  Heejo Lee,et al.  Detecting Malicious Web Links and Identifying Their Attack Types , 2011, WebApps.

[26]  Patrick Traynor,et al.  Sending Out an SMS: Characterizing the Security of the SMS Ecosystem with Public Gateways , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[27]  Yizheng Chen,et al.  Enabling Network Security Through Active DNS Datasets , 2016, RAID.

[28]  Steven D. Gribble,et al.  Cutting through the Confusion: A Measurement Study of Homograph Attacks , 2006, USENIX Annual Technical Conference, General Track.

[29]  Lorrie Faith Cranor,et al.  Protecting people from phishing: the design and evaluation of an embedded training email system , 2007, CHI.

[30]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[31]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[32]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[33]  Wei You,et al.  Cracking Classifiers for Evasion: A Case Study on the Google's Phishing Pages Filter , 2016, WWW.

[34]  Gang Wang,et al.  End-to-End Measurements of Email Spoofing Attacks , 2018, USENIX Security Symposium.

[35]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[36]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[37]  Meng Luo,et al.  Hindsight: Understanding the Evolution of UI Vulnerabilities in Mobile Browsers , 2017, CCS.

[38]  Nick Nikiforakis,et al.  Dial One for Scam: A Large-Scale Analysis of Technical Support Scams , 2016, NDSS.

[39]  Gianluca Stringhini,et al.  Towards Detecting Compromised Accounts on Social Networks , 2015, IEEE Transactions on Dependable and Secure Computing.

[40]  Xiao Han,et al.  PhishEye: Live Monitoring of Sandboxed Phishing Kits , 2016, CCS.

[41]  Gang Wang,et al.  Towards Understanding the Adoption of Anti-Spoofing Protocols in Email Systems , 2018, 2018 IEEE Cybersecurity Development (SecDev).

[42]  Vitaly Shmatikov,et al.  Fooling OCR Systems with Adversarial Text Images , 2018, ArXiv.

[43]  Xiaotie Deng,et al.  Detection of phishing webpages based on visual similarity , 2005, WWW '05.

[44]  Nikolaos Pitropakis,et al.  Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse , 2017, CCS.

[45]  Christopher Krügel,et al.  Measuring E-mail header injections on the world wide web , 2018, SAC.

[46]  Ben Zorn,et al.  "NOFUS: Automatically Detecting" + String.fromCharCode(32) + "ObFuSCateD ".toLowerCase() + "JavaScript Code" , 2011 .