Malicious JavaScript Detection using Statistical Language Model by Anumeha Shah The Internet has an immense importance in our day to day life, but at the same time, it has become the medium of infecting computers, attacking users, and distributing malicious code. As JavaScript is the principal language of client side programming, it is frequently used in conducting such attacks. Various approaches have been made to overcome the JavaScript security issues. Some advanced approaches utilize machine learning technology in combination with de-obfuscation and emulation. Many methods of analysis incorporate static analysis and dynamic analysis. Our solution is entirely based on static analysis, which avoids unnecessary runtime overhead. The central objective of this project is to integrate the work done by Eunjin (EJ) Jung et al. on Towards A Robust Detection of Malicious JavaScript (TARDIS) into the web browser via a Firefox add-on and to demonstrate the usability of our addon in defending against such attacks. TARDIS uses statistical language modeling for an automatic feature extraction and combines it with structural features from an abstract syntax tree [1]. We have developed a Firefox add-on that is capable of extracting JavaScript code from the page visited and classifying the JavaScript code as either malicious or benign. We leverage the benefit of using a pre-compiled training model in JavaScript Object Notation (JSON). JSON is lightweight and does not consume much memory on a user’s machine. Moreover, it stores the data as key-value pairs and easily maps to the data structures used in modern programming languages. The principle advantage of using a pre-compiled training model is better performance. Our model can achieve 98% accuracy on our sample dataset.
[1]
ChengXiang Zhai,et al.
Statistical Language Models for Information Retrieval: A Critical Review
,
2008,
Found. Trends Inf. Retr..
[2]
Eunjin Jung,et al.
Obfuscated malicious javascript detection using classification techniques
,
2009,
2009 4th International Conference on Malicious and Unwanted Software (MALWARE).
[3]
Wei Xu,et al.
JStill: mostly static detection of obfuscated malicious JavaScript code
,
2013,
CODASPY.
[4]
Benjamin Livshits,et al.
ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection
,
2011,
USENIX Security Symposium.
[5]
Phu H. Phung,et al.
A two-tier sandbox architecture for untrusted JavaScript
,
2012
.
[6]
Giovanni Vigna,et al.
Prophiler: a fast filter for the large-scale detection of malicious web pages
,
2011,
WWW.
[7]
Mark Stamp,et al.
Information security - principles and practice
,
2005
.
[8]
Pavel Laskov,et al.
Static detection of malicious JavaScript-bearing PDF documents
,
2011,
ACSAC '11.
[9]
Marius Kloft,et al.
Early detection of malicious behavior in JavaScript code
,
2012,
AISec '12.
[10]
Haining Wang,et al.
Characterizing insecure javascript practices on the web
,
2009,
WWW '09.
[11]
Andreas Dewald,et al.
Forschungsberichte der Fakultät IV – Elektrotechnik und Informatik C UJO : Efficient Detection and Prevention of Drive-by-Download Attacks
,
2010
.