Cujo: efficient detection and prevention of drive-by-download attacks

The JavaScript language is a core component of active and dynamic web content in the Internet today. Besides its great success in enhancing web applications, however, JavaScript provides the basis for so-called drive-by downloads---attacks exploiting vulnerabilities in web browsers and their extensions for unnoticeably downloading malicious software. Due to the diversity and frequent use of obfuscation in these attacks, static code analysis is largely ineffective in practice. While dynamic analysis and honeypots provide means to identify drive-by-download attacks, current approaches induce a significant overhead which renders immediate prevention of attacks intractable. In this paper, we present Cujo, a system for automatic detection and prevention of drive-by-download attacks. Embedded in a web proxy, Cujo transparently inspects web pages and blocks delivery of malicious JavaScript code. Static and dynamic code features are extracted on-the-fly and analysed for malicious patterns using efficient techniques of machine learning. We demonstrate the efficacy of Cujo in different experiments, where it detects 94% of the drive-by downloads with few false alarms and a median run-time of 500 ms per web page---a quality that, to the best of our knowledge, has not been attained in previous work on detection of drive-by-download attacks.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[3]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[6]  Salvatore J. Stolfo,et al.  Anagram: A Content Anomaly Detector Resistant to Mimicry Attack , 2006, RAID.

[7]  Wenke Lee,et al.  q-gram matching using tree models , 2006 .

[8]  Christopher Krügel,et al.  Noxes: a client-side solution for mitigating cross-site scripting attacks , 2006, SAC '06.

[9]  Konrad Rieck,et al.  Detecting Unknown Network Attacks Using Language Models , 2006, DIMVA.

[10]  Martin Johns,et al.  On JavaScript Malware and related threats , 2008, Journal in Computer Virology.

[11]  Niels Provos,et al.  The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[12]  Konrad Rieck,et al.  Linear-Time Computation of Similarity Measures for Sequential Data , 2008, J. Mach. Learn. Res..

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[15]  Charlie Miller,et al.  Engineering Heap Overflow Exploits with JavaScript , 2008, WOOT.

[16]  Christopher Krügel,et al.  Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks , 2009, DIMVA.

[17]  Benjamin Livshits,et al.  NOZZLE: A Defense Against Heap-spraying Code Injection Attacks , 2009, USENIX Security Symposium.

[18]  Christopher Krügel,et al.  Mitigating Drive-By Download Attacks: Challenges and Open Problems , 2009, iNetSeC.

[19]  Jose Nazario,et al.  PhoneyC: A Virtual Client Honeypot , 2009, LEET.

[20]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[21]  Andreas Dewald,et al.  ADSandbox: sandboxing JavaScript to fight malicious websites , 2010, SAC '10.