Anything to Hide? Studying Minified and Obfuscated Code in the Web

JavaScript has been used for various attacks on client-side web applications. To hinder both manual and automated analysis from detecting malicious scripts, code minification and code obfuscation may hide the behavior of a script. Unfortunately, little is currently known about how real-world websites use such code transformations. This paper presents an empirical study of obfuscation and minification in 967,149 scripts (424,023 unique) from the top 100,000 websites. The core of our study is a highly accurate (95%-100%) neural network-based classifier that we train to identify whether obfuscation or minification have been applied and if yes, using what tools. We find that code transformations are very widespread, affecting 38% of all scripts. Most of the transformed code has been minified, whereas advanced obfuscation techniques, such as encoding parts of the code or fetching all strings from a global array, affect less than 1% of all scripts (2,842 unique scripts in total). Studying which code gets obfuscated, we find that obfuscation is particularly common in certain website categories, e.g., adult content. Further analysis of the obfuscated code shows that most of it is similar to the output produced by a single obfuscation tool and that some obfuscated scripts trigger suspicious behavior, such as likely fingerprinting and timing attacks. Finally, we show that obfuscation comes at a cost, because it slows down execution and risks to produce code that changes the intended behavior. Overall, our study shows that the security community must consider minified and obfuscated JavaScript code, and it provides insights into what kinds of transformations to focus on. Our learned classifiers provide an automated and accurate way to identify obfuscated code, and we release a set of real-world obfuscated scripts for future research.

[1]  Wei Xu,et al.  The power of obfuscation techniques in malicious JavaScript code: A measurement study , 2012, 2012 7th International Conference on Malicious and Unwanted Software.

[2]  Michael Pradel,et al.  Performance Issues and Optimizations in JavaScript: An Empirical Study , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[3]  Vitaly Shmatikov,et al.  The Postman Always Rings Twice: Attacking and Defending postMessage in HTML5 Websites , 2013, NDSS.

[4]  Paolo Falcarin,et al.  A large study on the effect of code obfuscation on the quality of java code , 2015, Empirical Software Engineering.

[5]  Alexander Marsalek,et al.  WebRTC: your privacy is at risk , 2017, SAC.

[6]  Michael Pradel,et al.  Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript-based Web Servers , 2018, USENIX Security Symposium.

[7]  Wei Xu,et al.  JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.

[8]  Wouter Joosen,et al.  You are what you include: large-scale evaluation of remote javascript inclusions , 2012, CCS.

[9]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[10]  Sam Malek,et al.  A Large-Scale Empirical Study on the Effects of Code Obfuscations on Android Apps and Anti-Malware Products , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[11]  Andreas Krause,et al.  Learning programs from noisy data , 2016, POPL.

[12]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[13]  Steve Hanna,et al.  FLAX: Systematic Discovery of Client-side Validation Vulnerabilities in Rich Web Applications , 2010, NDSS.

[14]  Koushik Sen,et al.  Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts , 2017, ArXiv.

[15]  Hovav Shacham,et al.  Pixel Perfect : Fingerprinting Canvas in HTML 5 , 2012 .

[16]  Koushik Sen,et al.  A Survey of Dynamic Analysis and Test Generation for JavaScript , 2017, ACM Comput. Surv..

[17]  Eunjin Jung,et al.  Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[18]  Shyi-Ming Chen,et al.  JSOD: JavaScript obfuscation detector , 2015, Secur. Commun. Networks.

[19]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[20]  Krzysztof Kryszczuk,et al.  Detecting obfuscated JavaScripts using machine learning , 2016 .

[21]  Vinod Ganapathy,et al.  An Analysis of the Mozilla Jetpack Extension Framework , 2012, ECOOP.

[22]  Tobias Lauinger,et al.  Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript Libraries on the Web , 2018, NDSS.

[23]  Christopher Krügel,et al.  Hulk: Eliciting Malicious Behavior in Browser Extensions , 2014, USENIX Security Symposium.

[24]  B. Livshits,et al.  Understanding and Automatically Preventing Injection Attacks on N ODE . JS , .

[25]  Ben Zorn,et al.  "NOFUS: Automatically Detecting" + String.fromCharCode(32) + "ObFuSCateD ".toLowerCase() + "JavaScript Code" , 2011 .

[26]  Khaled Yakdan,et al.  Helping Johnny to Analyze Malware: A Usability-Optimized Decompiler and Malware Analysis User Study , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[27]  Paolo Falcarin,et al.  Search Based Clustering for Protecting Software with Diversified Updates , 2016, SSBSE.

[28]  Lujo Bauer,et al.  Riding out DOMsday: Towards Detecting and Preventing DOM Cross-Site Scripting , 2018, NDSS.

[29]  Benjamin Livshits,et al.  Detecting JavaScript races that matter , 2015, ESEC/SIGSOFT FSE.

[30]  Mahdi Abadi,et al.  JSObfusDetector: A binary PSO-based one-class classifier ensemble to detect obfuscated JavaScript code , 2015, 2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP).

[31]  Yao Wang,et al.  A deep learning approach for detecting malicious JavaScript code , 2016, Secur. Commun. Networks.

[32]  Benjamin Livshits,et al.  SYNODE: Understanding and Automatically Preventing Injection Attacks on NODE.JS , 2018, NDSS.

[33]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[34]  Saumya K. Debray,et al.  Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[35]  Koushik Sen,et al.  TypeDevil: Dynamic Type Inconsistency Analysis for JavaScript , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[36]  Michael Pradel,et al.  ConflictJS: Finding and Understanding Conflicts Between JavaScript Libraries , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[37]  Li Wang,et al.  Software Protection on the Go: A Large-Scale Empirical Study on Mobile App Obfuscation , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[38]  Ben Stock,et al.  25 million flows later: large-scale detection of DOM-based XSS , 2013, CCS.

[39]  Gerardo Canfora,et al.  An Empirical Study of Metric-Based Methods to Detect Obfuscated Code , 2013 .