Understanding the Phishing Ecosystem

In “phishing attacks”, phishing websites mimic trustworthy websites in order to steal sensitive information from end-users. Despite research by both academia and the industry focusing on development of anti-phishing detection techniques, phishing has increasingly become an online threat. Our inability to slow down phishing attacks shows that we need to go beyond detection and focus more on understanding the phishing ecosystem. In this thesis, we contribute in three ways to understand the phishing ecosystem and to offer insight for future anti-phishing efforts. First, we provide a new and comparative study on the life cycle of phishing and malware attacks. Specifically, we use public clickthrough statistics of the Bitly URL shortening service to analyze the click-through rate and timespan of phishing and malware attacks before (and after) they were reported. We find that the efforts against phishing attacks are stronger than those against malware attacks. We also find phishing activity indicating that mitigation strategies are not taking down phishing websites fast enough. Second, we develop a method that finds similarities between the DOMs of phishing attacks, since it is known that phishing attacks are variations of previous attacks. We find that existing methods do not capture the structure of the DOM, and question whether they are failing to catch some of the similar attacks. We accordingly evaluate the feasibility of applying Pawlik and Augsten’s recent implementation of Tree Edit Distance (AP-TED) calculations as a way to compare DOMs and identify similar phishing attack instances. Our method agrees with existing ones that 94% of our phishing database are replicas. It also better discriminates the similarities, but at a higher computational cost. The high agreement between methods strengthens the understanding that most phishing attacks are variations, which affects future anti-phishing strategies. Third, we develop a domain classifier exploiting the history and internet presence of a domain with machine learning techniques. It uses only publicly available information to determine whether a known phishing website is hosted on a legitimate but compromised domain, in which case the domain owner is also a victim, or whether the domain itself is maliciously registered. This is especially relevant due to the recent adoption of the General Data Protection Regulation (GDPR), which prevents certain registration information to be made publicly available. Our classifier achieves 94% accuracy on future malicious domains, while maintaining 88% and 92% accuracy on malicious and compromised datasets respectively from two other sources. Accurate domain classification offers insight with regard to different take-down strategies, and with regard to registrars’ prevention of fraudulent registrations.

[1]  Bart P. Knijnenburg,et al.  When cybercrimes strike undergraduates , 2016, 2016 APWG Symposium on Electronic Crime Research (eCrime).

[2]  Nikolaus Augsten,et al.  Tree edit distance: Robust and memory-efficient , 2016, Inf. Syst..

[3]  Gregor von Bochmann,et al.  Domain Classifier: Compromised Machines Versus Malicious Registrations , 2019, ICWE.

[4]  Michael K. Reiter,et al.  An Epidemiological Study of Malware Encounters in a Large Enterprise , 2014, CCS.

[5]  Ponnurangam Kumaraguru,et al.  bit.ly/malicious: Deep dive into short URL based e-crime detection , 2014, 2014 APWG Symposium on Electronic Crime Research (eCrime).

[6]  Vasileios Kandylas,et al.  The utility of tweeted URLs for web search , 2010, WWW '10.

[7]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[8]  Marco Balduzzi,et al.  Automatic Extraction of Indicators of Compromise for Web Applications , 2016, WWW.

[9]  Tyler Moore,et al.  An Empirical Analysis of the Current State of Phishing Attack and Defence , 2007, WEIS.

[10]  Lorrie Faith Cranor,et al.  An Empirical Analysis of Phishing Blacklists , 2009, CEAS 2009.

[11]  Robert Biddle,et al.  Geo-Phisher: the design and evaluation of information visualizations about internet phishing trends , 2016, 2016 APWG Symposium on Electronic Crime Research (eCrime).

[12]  Fabio Roli,et al.  DeltaPhish: Detecting Phishing Webpages in Compromised Websites , 2017, ESORICS.

[13]  Yong Wang,et al.  You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[14]  Gang Wang,et al.  End-to-End Measurements of Email Spoofing Attacks , 2018, USENIX Security Symposium.

[15]  Tyler Moore,et al.  Evil Searching: Compromise and Recompromise of Internet Hosts for Phishing , 2009, Financial Cryptography.

[16]  Hans-Jörg Schek,et al.  Generating Vector Spaces On-the-fly for Flexible XML Retrieval , 2002 .

[17]  Karl Bringmann,et al.  Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (Unless APSP Can) , 2017, SODA.

[18]  Marti A. Hearst,et al.  Why phishing works , 2006, CHI.

[19]  Nikolaus Augsten,et al.  A New Perspective on the Tree Edit Distance , 2017, SISAP.

[20]  J. Hertz,et al.  Generalization in a linear perceptron in the presence of noise , 1992 .

[21]  J. Doug Tygar,et al.  The battle against phishing: Dynamic Security Skins , 2005, SOUPS '05.

[22]  Javier Vargas,et al.  Knowing your enemies: leveraging data analysis to expose phishing patterns against a major US financial institution , 2016, 2016 APWG Symposium on Electronic Crime Research (eCrime).

[23]  Artsiom Holub,et al.  COINHOARDER: Tracking a ukrainian bitcoin phishing ring DNS style , 2018, 2018 APWG Symposium on Electronic Crime Research (eCrime).

[24]  Baowen Xu,et al.  Web Phishing Detection Based on Page Spatial Layout Similarity , 2013, Informatica.

[25]  John Heidemann,et al.  AuntieTuna: Personalized Content-based Phishing Detection , 2016 .

[26]  A. Neumann,et al.  Security and Privacy Implications of URL Shortening Services , 2010 .

[27]  Markus Strohmaier,et al.  Short links under attack: geographical analysis of spam in a URL shortener network , 2012, HT '12.

[28]  Vadlamani Ravi,et al.  Particle Swarm Optimization Trained Class Association Rule Mining: Application to Phishing Detection , 2016, ICIA.

[29]  Ilango Krishnamurthi,et al.  A comprehensive and efficacious architecture for detecting phishing webpages , 2014, Comput. Secur..

[30]  Adam Doupé,et al.  Inside a phisher's mind: Understanding the anti-phishing ecosystem through phishing kit analysis , 2018, 2018 APWG Symposium on Electronic Crime Research (eCrime).

[31]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[32]  Jason I. Hong,et al.  A hybrid phish detection approach by identity discovery and keywords retrieval , 2009, WWW '09.

[33]  Mike Thelwall,et al.  A fair history of the Web? Examining country balance in the Internet Archive , 2004 .

[34]  Gianluca Stringhini,et al.  Two years of short URLs internet measurement: security threats and countermeasures , 2013, WWW.

[35]  Ilango Krishnamurthi,et al.  An efficacious method for detecting phishing webpages through target domain identification , 2014, Decis. Support Syst..

[36]  Scott Dick,et al.  Detecting visually similar Web pages: Application to phishing detection , 2010, TOIT.

[37]  Tyler Moore,et al.  The Impact of Incentives on Notice and Take-down , 2008, WEIS.

[38]  Wouter Joosen,et al.  Herding Vulnerable Cats: A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting , 2017, CCS.

[39]  Nikolaos Pitropakis,et al.  Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse , 2017, CCS.

[40]  Qian Cui,et al.  Tracking Phishing Attacks Over Time , 2017, WWW.

[41]  Albert Bifet,et al.  MACHINE LEARNING FOR DATA STREAMS , 2018 .

[42]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[43]  Fabrício Benevenuto,et al.  Phi.sh/$oCiaL: the phishing landscape through short URLs , 2011, CEAS '11.

[44]  Ponnurangam Kumaraguru,et al.  Emerging phishing trends and effectiveness of the anti-phishing landing page , 2014, 2014 APWG Symposium on Electronic Crime Research (eCrime).

[45]  Gregor von Bochmann,et al.  Using URL shorteners to compare phishing and malware attacks , 2018, 2018 APWG Symposium on Electronic Crime Research (eCrime).

[46]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[47]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[48]  Christopher Krügel,et al.  A layout-similarity-based approach for detecting phishing pages , 2007, 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.

[49]  Kang-Leng Chiew,et al.  Phishing Detection via Identification of Website Identity , 2013, 2013 International Conference on IT Convergence and Security (ICITCS).

[50]  Ankit Kumar Jain,et al.  Phishing Detection: Analysis of Visual Similarity Based Approaches , 2017, Secur. Commun. Networks.

[51]  Seung Joo Kim,et al.  SHRT : New Method of URL Shortening including Relative Word of Target URL , 2013 .

[52]  Ahmed F. Shosha,et al.  Large scale detection of IDN domain name masquerading , 2018, 2018 APWG Symposium on Electronic Crime Research (eCrime).

[53]  Qian Cui,et al.  Phishing Attacks Modifications and Evolutions , 2018, ESORICS.

[54]  K. K. Bhoyar,et al.  Soft Computing Approaches to Classification of Emails for Sentiment Analysis , 2016, ICIA.

[55]  Richard Chbeir,et al.  An overview on XML similarity: Background, current trends and future directions , 2009, Comput. Sci. Rev..

[56]  Qian Cui,et al.  Using AP-TED to Detect Phishing Attack Variations , 2018, 2018 16th Annual Conference on Privacy, Security and Trust (PST).

[57]  Fabio A. González,et al.  Classifying phishing URLs using recurrent neural networks , 2017, 2017 APWG Symposium on Electronic Crime Research (eCrime).

[58]  Gregor von Bochmann,et al.  The "Game Hack" Scam , 2019, ICWE.

[59]  Samuel Marchal,et al.  On Designing and Evaluating Phishing Webpage Detection Techniques for the Real World , 2018, CSET @ USENIX Security Symposium.

[60]  Guy-Vincent Jourdan,et al.  Victim or Attacker? A Multi-dataset Domain Classification of Phishing Attacks , 2019, 2019 17th International Conference on Privacy, Security and Trust (PST).

[61]  Gang Wang,et al.  Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild , 2018, Internet Measurement Conference.

[62]  Yanick Fratantonio,et al.  Phishing Attacks on Modern Android , 2018, CCS.

[63]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[64]  Sotiris Ioannidis,et al.  we.b: the web of short urls , 2011, WWW.

[65]  Calton Pu,et al.  Click traffic analysis of short URL spam on Twitter , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[66]  Xiao Han,et al.  PhishEye: Live Monitoring of Sandboxed Phishing Kits , 2016, CCS.

[67]  Sonia Chiasson,et al.  Why phishing still works: User strategies for combating phishing attacks , 2015, Int. J. Hum. Comput. Stud..

[68]  Zhou Li,et al.  Don't Let One Rotten Apple Spoil the Whole Barrel: Towards Automated Detection of Shadowed Domains , 2017, CCS.

[69]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[70]  Gianluca Stringhini,et al.  Stranger danger: exploring the ecosystem of ad-based URL shortening services , 2014, WWW.

[71]  Zhiqiang Lin,et al.  SMARTGEN: Exposing Server URLs of Mobile Apps With Selective Symbolic Execution , 2017, WWW.

[72]  Gang Liu,et al.  Antiphishing through Phishing Target Discovery , 2012, IEEE Internet Computing.

[73]  Christopher Krügel,et al.  On the Effectiveness of Techniques to Detect Phishing Sites , 2007, DIMVA.

[74]  Giovane C. M. Moura,et al.  ENTRADA: enabling DNS big data applications , 2016, 2016 APWG Symposium on Electronic Crime Research (eCrime).

[75]  Weimin Chen,et al.  New Algorithm for Ordered Tree-to-Tree Correction Problem , 2001, J. Algorithms.

[76]  K. S. Kuppusamy,et al.  MASPHID: A Model to Assist Screen Reader Users for Detecting Phishing Sites Using Aural and Visual Similarity Measures , 2016, ICIA.

[77]  Niels Provos,et al.  The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[78]  Matthew Wright,et al.  POSTER: Phishing Website Detection with a Multiphase Framework to Find Visual Similarity , 2016, CCS.

[79]  Vern Paxson,et al.  Data Breaches, Phishing, or Malware?: Understanding the Risks of Stolen Credentials , 2017, CCS.

[80]  David A. Wagner,et al.  Detecting Credential Spearphishing in Enterprise Settings , 2017, USENIX Security Symposium.

[81]  Wei Wang,et al.  Favicon - a clue to phishing sites detection , 2013, 2013 APWG eCrime Researchers Summit.

[82]  Nick Feamster,et al.  PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration , 2016, CCS.

[83]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[84]  Kai Chen,et al.  Unleashing the Walking Dead: Understanding Cross-App Remote Infections on Mobile WebViews , 2017, CCS.