Detecting Malicious Exploit Kits using Tree-based Similarity Searches

Unfortunately, the computers we use for everyday activities can be infiltrated while simply browsing innocuous sites that, unbeknownst to the website owner, may be laden with malicious advertisements. So-called malvertising, redirects browsers to web-based exploit kits that are designed to find vulnerabilities in the browser and subsequently download malicious payloads. We propose a new approach for detecting such malfeasance by leveraging the inherent structural patterns in HTTP traffic to classify exploit kit instances. Our key insight is that an exploit kit leads the browser to download payloads using multiple requests from malicious servers. We capture these interactions in a "tree-like" form, and using a scalable index of malware samples, model the detection process as a subtree similarity search problem. The approach is evaluated on 3800 hours of real-world traffic including over 4 billion flows and reduces false positive rates by four orders of magnitude over current state-of-the-art techniques with comparable true positive rates. We show that our approach can operate in near real-time, and is able to handle peak traffic levels on a large enterprise network --- identifying 28 new exploit kit instances during our analysis period.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  Saumya K. Debray,et al.  Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[3]  Fang Yu,et al.  Finding the Linchpins of the Dark Web: a Study on Topologically Dedicated Hosts on Malicious Web Infrastructures , 2013, 2013 IEEE Symposium on Security and Privacy.

[4]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[5]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[6]  Somesh Jha,et al.  An architecture for generating semantics-aware signatures , 2005 .

[7]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[8]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[9]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[10]  Kang G. Shin,et al.  RB-Seeker: Auto-detection of Redirection Botnets , 2009, NDSS.

[11]  Stefan Savage,et al.  Manufacturing compromise: the emergence of exploit-as-a-service , 2012, CCS.

[12]  Antonio Nucci,et al.  Detecting malicious HTTP redirections using trees of user browsing activity , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[13]  Christopher Krügel,et al.  PExy: The Other Side of Exploit Kits , 2014, DIMVA.

[14]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[15]  Gang Wang,et al.  Detecting malicious landing pages in Malware Distribution Networks , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[16]  Fang Yu,et al.  Knowing your enemy: understanding and detecting malicious web advertising , 2012, CCS '12.

[17]  Naren Ramakrishnan,et al.  Detection of stealthy malware activities with traffic causality and scalable triggering relation discovery , 2014, AsiaCCS.

[18]  Jayant Madhavan,et al.  Socialising Data with Google Fusion Tables , 2010, IEEE Data Eng. Bull..

[19]  Gianluca Stringhini,et al.  Shady paths: leveraging surfing crowds to detect malicious web pages , 2013, CCS.

[20]  Niels Provos,et al.  The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[21]  V. N. Venkatakrishnan,et al.  WebWinnow: leveraging exploit kit workflows to detect malicious urls , 2014, CODASPY '14.

[22]  Wei Xu,et al.  JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.

[23]  Paolo Milani Comparetti,et al.  EvilSeed: A Guided Approach to Finding Malicious Web Pages , 2012, 2012 IEEE Symposium on Security and Privacy.

[24]  Ben Zorn,et al.  Kizzle: A Signature Compiler for Exploit Kits , 2017 .

[25]  Vivek S. Pai,et al.  Towards understanding modern web traffic , 2011, SIGMETRICS '11.

[26]  Helen J. Wang,et al.  Clickjacking: Attacks and Defenses , 2012, USENIX Security Symposium.

[27]  Shouhuai Xu,et al.  Cross-layer detection of malicious websites , 2013, CODASPY.

[28]  Sandeep Yadav,et al.  Detecting algorithmically generated malicious domain names , 2010, IMC '10.

[29]  Gianluca Stringhini,et al.  The Dark Alleys of Madison Avenue: Understanding Malicious Advertisements , 2014, Internet Measurement Conference.

[30]  Andreas Dewald,et al.  Forschungsberichte der Fakultät IV – Elektrotechnik und Informatik C UJO : Efficient Detection and Prevention of Drive-by-Download Attacks , 2010 .

[31]  Roberto Perdisci,et al.  WebWitness: Investigating, Categorizing, and Mitigating Malware Download Paths , 2015, USENIX Security Symposium.

[32]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[33]  Sara Cohen Indexing for subtree similarity-search using edit distance , 2013, SIGMOD '13.

[34]  Divesh Srivastava,et al.  Weighted Set-Based String Similarity , 2010, IEEE Data Eng. Bull..

[35]  Kang Li,et al.  ClickMiner: Towards Forensic Reconstruction of User-Browser Interactions from Network Traces , 2014, CCS.