JSAC: A Novel Framework to Detect Malicious JavaScript via CNNs over AST and CFG

JavaScript (JS) is a dominant programming language in web/mobile development, while it is also notoriously abused by attackers due to its powerful characteristics, e.g., dynamic, prototype-based and multi-paradigm, which foil most static and dynamic analysis approaches. To detect malicious JS instances, several machine learning-based methods have been developed recently. However, these methods took JS as a natural language instead of a programming one, which can not capture its syntactic and semantic features.In this paper, we present JSAC, a novel framework to detect JS malware. It combines deep learning and program analysis techniques to capture the syntactic and semantic features of JS programs. Specifically, to get a JS program’s syntactic information, we build its abstract syntax tree and employ a tree-based convolutional neural network (CNN) to extract features from it. To get its semantic information, we construct its control flow graph and feed it to another graph-based CNN. Last, the features extracted from two CNNs are fused for final detection. Evaluation on a corpus of 69,523 JS files indicates that JSAC outperforms 4 other models with 98.73% F1-score in detecting JS malware.

[1]  Eunjin Jung,et al.  Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[2]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[3]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Yang Liu,et al.  JSDC: A Hybrid Approach for JavaScript Malware Detection and Classification , 2015, AsiaCCS.

[6]  P. Komisarczuk,et al.  Identification of Malicious Web Pages with Static Heuristics , 2008, 2008 Australasian Telecommunication Networks and Applications Conference.

[7]  Christopher Krügel,et al.  Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks , 2009, DIMVA.

[8]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[9]  Seiichi Ozawa,et al.  A Machine Learning Approach to Malicious JavaScript Detection using Fixed Length Vector Representation , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Minh Le Nguyen,et al.  Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[12]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[13]  Yao Wang,et al.  A deep learning approach for detecting malicious JavaScript code , 2016, Secur. Commun. Networks.

[14]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[15]  Benjamin Livshits,et al.  NOZZLE: A Defense Against Heap-spraying Code Injection Attacks , 2009, USENIX Security Symposium.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Seong-je Cho,et al.  Efficient Detection of Malicious Web Pages Using High-Interaction Client Honeypots , 2012, J. Inf. Sci. Eng..

[18]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[19]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[20]  Zhi Jin,et al.  Building Program Vector Representations for Deep Learning , 2014, KSEM.

[21]  Zhenkai Liang,et al.  Detecting Malicious Behaviors in JavaScript Applications , 2018, IEEE Access.

[22]  Mahdi Abadi,et al.  Detecting Obfuscated JavaScript Malware Using Sequences of Internal Function Calls , 2014, ACM Southeast Regional Conference.

[23]  Niels Provos,et al.  The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[24]  Liang Liu,et al.  Research on Malicious JavaScript Detection Technology Based on LSTM , 2018, IEEE Access.

[25]  Omer F. Rana,et al.  Honeyware: A Web-Based Low Interaction Client Honeypot , 2010, 2010 Third International Conference on Software Testing, Verification, and Validation Workshops.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Michael Backes,et al.  JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript , 2018, DIMVA.

[28]  Wei Xu,et al.  JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.