Early detection of malicious behavior in JavaScript code

Malicious JavaScript code is widely used for exploiting vulnerabilities in web browsers and infecting users with malicious software. Static detection methods fail to protect from this threat, as they are unable to cope with the complexity and dynamics of interpreted code. In contrast, the dynamic analysis of JavaScript code at run-time has proven to be effective in identifying malicious behavior. During the execution of the code, however, damage may already take place and thus an early detection is critical for effective protection. In this paper, we introduce EarlyBird: a detection method optimized for early identification of malicious behavior in JavaScript code. The method uses machine learning techniques for jointly optimizing the accuracy and the time of detection. In an evaluation with hundreds of real attacks, EarlyBird precisely identifies malicious behavior while limiting the amount of malicious code that is executed by a factor of 2 (43%) on average.

[1]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[2]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[5]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[6]  Ramarathnam Venkatesan,et al.  ZDVUE: prioritization of javascript attacks to discover new vulnerabilities , 2011, AISec '11.

[7]  Benjamin Livshits,et al.  Rozzle: De-cloaking Internet Malware , 2012, 2012 IEEE Symposium on Security and Privacy.

[8]  Michael Meier,et al.  Throwing a MonkeyWrench into Web Attackers Plans , 2010, Communications and Multimedia Security.

[9]  Konrad Rieck,et al.  Linear-Time Computation of Similarity Measures for Sequential Data , 2008, J. Mach. Learn. Res..

[10]  Andreas Dewald,et al.  Forschungsberichte der Fakultät IV – Elektrotechnik und Informatik C UJO : Efficient Detection and Prevention of Drive-by-Download Attacks , 2010 .

[11]  Vinod Yegneswaran,et al.  BLADE: an attack-agnostic approach for preventing drive-by malware infections , 2010, CCS '10.

[12]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[13]  Bin Liu,et al.  WebShield: Enabling Various Web Defense Techniques without Client Side Modifications , 2011, NDSS.

[14]  Jose Nazario,et al.  PhoneyC: A Virtual Client Honeypot , 2009, LEET.

[15]  Benjamin Livshits,et al.  NOZZLE: A Defense Against Heap-spraying Code Injection Attacks , 2009, USENIX Security Symposium.

[16]  Xuxian Jiang,et al.  Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities , 2006, NDSS.

[17]  Felix C. Freiling,et al.  Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients , 2008, Sicherheit.

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[21]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[22]  Christopher Krügel,et al.  Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks , 2009, DIMVA.

[23]  Thorsten Holz,et al.  IceShield: Detection and Mitigation of Malicious Websites with a Frozen DOM , 2011, RAID.

[24]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.