论文信息 - JSAC: A Novel Framework to Detect Malicious JavaScript via CNNs over AST and CFG

JSAC: A Novel Framework to Detect Malicious JavaScript via CNNs over AST and CFG

JavaScript (JS) is a dominant programming language in web/mobile development, while it is also notoriously abused by attackers due to its powerful characteristics, e.g., dynamic, prototype-based and multi-paradigm, which foil most static and dynamic analysis approaches. To detect malicious JS instances, several machine learning-based methods have been developed recently. However, these methods took JS as a natural language instead of a programming one, which can not capture its syntactic and semantic features.In this paper, we present JSAC, a novel framework to detect JS malware. It combines deep learning and program analysis techniques to capture the syntactic and semantic features of JS programs. Specifically, to get a JS program’s syntactic information, we build its abstract syntax tree and employ a tree-based convolutional neural network (CNN) to extract features from it. To get its semantic information, we construct its control flow graph and feed it to another graph-based CNN. Last, the features extracted from two CNNs are fused for final detection. Evaluation on a corpus of 69,523 JS files indicates that JSAC outperforms 4 other models with 98.73% F1-score in detecting JS malware.

[1] Eunjin Jung,et al. Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[2] Somesh Jha,et al. Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[3] Yoseba K. Penya,et al. N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Yang Liu,et al. JSDC: A Hybrid Approach for JavaScript Malware Detection and Classification , 2015, AsiaCCS.

[6] P. Komisarczuk,et al. Identification of Malicious Web Pages with Static Heuristics , 2008, 2008 Australasian Telecommunication Networks and Applications Conference.

[7] Christopher Krügel,et al. Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks , 2009, DIMVA.

[8] Christopher Krügel,et al. Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[9] Seiichi Ozawa,et al. A Machine Learning Approach to Malicious JavaScript Detection using Fixed Length Vector Representation , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[11] Minh Le Nguyen,et al. Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[12] Zhendong Su,et al. On the naturalness of software , 2012, ICSE 2012.

[13] Yao Wang,et al. A deep learning approach for detecting malicious JavaScript code , 2016, Secur. Commun. Networks.

[14] Frances E. Allen,et al. Control-flow analysis , 2022 .

[15] Benjamin Livshits,et al. NOZZLE: A Defense Against Heap-spraying Code Injection Attacks , 2009, USENIX Security Symposium.

[16] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17] Seong-je Cho,et al. Efficient Detection of Malicious Web Pages Using High-Interaction Client Honeypots , 2012, J. Inf. Sci. Eng..

[18] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[19] Tao Wang,et al. Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[20] Zhi Jin,et al. Building Program Vector Representations for Deep Learning , 2014, KSEM.

[21] Zhenkai Liang,et al. Detecting Malicious Behaviors in JavaScript Applications , 2018, IEEE Access.

[22] Mahdi Abadi,et al. Detecting Obfuscated JavaScript Malware Using Sequences of Internal Function Calls , 2014, ACM Southeast Regional Conference.

[23] Niels Provos,et al. The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[24] Liang Liu,et al. Research on Malicious JavaScript Detection Technology Based on LSTM , 2018, IEEE Access.

[25] Omer F. Rana,et al. Honeyware: A Web-Based Low Interaction Client Honeypot , 2010, 2010 Third International Conference on Software Testing, Verification, and Validation Workshops.

[26] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[27] Michael Backes,et al. JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript , 2018, DIMVA.

[28] Wei Xu,et al. JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.

[29] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.