A Practical Guide for Detecting the Java Script-Based Malware Using Hidden Markov Models and Linear Classifiers

The World Wide Web evolved so rapidly that it is no longer considered a luxury, but a necessity. That is why currently the most popular infection vectors used by cyber criminals are either web pages or commonly used documents (such as pdf files). In both of these cases, the malicious actions performed are written in Java Script. Because of this, Java Script has become the preferred language for spreading malware. In order to be able to stop malicious content from executing, detection of its infection vector is crucial. In this paper we propose various methods for detecting Java Script-based attack vectors. For achieving our goal we first need to fight metamorphism techniques usually used in Java Script malicious code, which are by no means trivial: garbage instruction insertion, variable renaming, equivalent instruction substitution, function permutation, instruction reordering, and so on. Our approach to deal with metamorphism starts with splitting the Java Script content in components and filtering the insignificant ones. We then use a data set, consisting in over one million Java Script files in order to test several machine learning algorithms such as Hidden Markov Models, linear classifiers and hybrid approaches for malware detection. Finally, we analyze these detection methods from a practical point of view, emphasizing the need for a very low false positive rate and the ability to be trained on large datasets.

[1]  Pavel Laskov,et al.  Detection of Malicious PDF Files Based on Hierarchical Document Structure , 2013, NDSS.

[2]  Angelos Stavrou,et al.  Malicious PDF detection using metadata and structural features , 2012, ACSAC '12.

[3]  Razvan Benchea,et al.  A practical approach on clustering malicious PDF documents , 2012, Journal in Computer Virology.

[4]  Konrad Rieck,et al.  Autonomous learning for detection of JavaScript attacks: vision or reality? , 2012, AISec '12.

[5]  Wei Xu,et al.  The power of obfuscation techniques in malicious JavaScript code: A measurement study , 2012, 2012 7th International Conference on Malicious and Unwanted Software.

[6]  Razvan Benchea,et al.  Optimized Zero False Positives Perceptron Training for Malware Detection , 2012, 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[7]  Saumya K. Debray,et al.  Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[8]  Konrad Rieck,et al.  Intelligent Defense against Malicious JavaScript Code , 2012, PIK Prax. Informationsverarbeitung Kommun..

[9]  Pavel Laskov,et al.  Static detection of malicious JavaScript-bearing PDF documents , 2011, ACSAC '11.

[10]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[11]  Evangelos P. Markatos,et al.  Combining static and dynamic analysis for the detection of malicious documents , 2011, EUROSEC '11.

[12]  Evangelos P. Markatos,et al.  Comprehensive shellcode detection using runtime heuristics , 2010, ACSAC '10.

[13]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[14]  Eunjin Jung,et al.  Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[15]  Mark Stamp,et al.  Profile hidden Markov models and metamorphic virus detection , 2009, Journal in Computer Virology.

[16]  Martin Johns,et al.  On JavaScript Malware and related threats , 2008, Journal in Computer Virology.

[17]  Martin Johns,et al.  Protecting the Intranet Against "JavaScript Malware" and Related Attacks , 2007, DIMVA.

[18]  Scott McGhee PAIRWISE ALIGNMENT OF METAMORPHIC COMPUTER VIRUSES , 2007 .

[19]  Galen C. Hunt,et al.  Detours: binary interception of Win32 functions , 1999 .