JSObfusDetector: A binary PSO-based one-class classifier ensemble to detect obfuscated JavaScript code

JavaScript code obfuscation has become a major technique used by malware writers to evade static analysis techniques. Over the past years, a number of dynamic analysis techniques have been proposed to detect obfuscated malicious JavaScript code at runtime. However, because of their runtime overheads, these techniques are slow and thus not widely used in practice. On the other hand, since a large quantity of benign JavaScript code is obfuscated to protect intellectual property, it is not effective to use the intrinsic features of obfuscated JavaScript code for static analysis purposes. Therefore, we are forced to distinguish between obfuscated and non-obfuscated JavaScript code so that we can devise an efficient and effective analysis technique to detect malicious JavaScript code. In this paper, we address this issue by presenting JSObfusDetector, a novel one-class classifier ensemble to detect obfuscated JavaScript code. To construct the classifier ensemble, we apply a binary particle swarm optimization (PSO) algorithm, called ParticlePruner, on an initial ensemble of one-class SVM classifiers to find a sub-ensemble whose members are both accurate and have diversity in their outputs. We evaluate JSObfusDetector using a dataset of obfuscated and non-obfuscated JavaScript code. The experimental results show that JSObfusDetector can achieve about 97% precision, 91 % recall, and 94% F-measure.

[1]  J. Shane Culpepper,et al.  Efficient and effective realtime prediction of drive-by download attacks , 2014, J. Netw. Comput. Appl..

[2]  Fabio Roli,et al.  Intrusion detection in computer networks by a modular ensemble of one-class classifiers , 2008, Inf. Fusion.

[3]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[5]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[6]  YoungHan Choi,et al.  Automatic Detection for JavaScript Obfuscation Attacks in Web Pages through String Pattern Analysis , 2009, FGIT.

[7]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[8]  Qiang Fu,et al.  YALIH, Yet Another Low Interaction Honeyclient , 2014, AISC.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[11]  Wei Xu,et al.  The power of obfuscation techniques in malicious JavaScript code: A measurement study , 2012, 2012 7th International Conference on Malicious and Unwanted Software.

[12]  Saumya K. Debray,et al.  Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[13]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[14]  Wei Xu,et al.  JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.

[15]  Shyi-Ming Chen,et al.  JSOD: JavaScript obfuscation detector , 2015, Secur. Commun. Networks.

[16]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[17]  Ben Zorn,et al.  "NOFUS: Automatically Detecting" + String.fromCharCode(32) + "ObFuSCateD ".toLowerCase() + "JavaScript Code" , 2011 .

[18]  Giorgio Giacinto,et al.  Lux0R: Detection of Malicious PDF-embedded JavaScript code through Discriminant Analysis of API References , 2014, AISec '14.

[19]  Benjamin G. Zorn,et al.  Zozzle: Low-overhead Mostly Static JavaScript Malware Detection , 2010 .