Detecting Malicious JavaScript Using Structure-Based Analysis of Graph Representation

Malicious JavaScript code in web applications poses a significant threat as cyber attackers exploit it to perform various malicious activities. Detecting these malicious scripts is challenging, given their diverse nature and the continuous evolution of attack techniques. Most approaches formulate this task as a static or sequential feature of the script, which is insufficient in terms of flexibility to various attack techniques and the ability to capture the script’s semantic meaning. To address this issue, we propose an alternative approach that leverages JavaScript code’s abstract syntax tree (AST) representation, focusing on distinctive syntactic structure features. The proposed approach uses graph neural networks to extract structural features from the AST graph while considering the attribute features of individual nodes, which uses neural message passing with neighborhood aggregation. The proposed method encodes both the local AST graph structure and attributes of the nodes. It enables capturing the source code’s semantic meaning and exploits the signature structure in the AST representations. The proposed method consistently achieved high detection performance in extensive experiments on two different datasets, with accuracy scores of 99.4% and 96.92%. The obtained evaluation metrics demonstrate the effectiveness of our approach in accurately detecting malicious JavaScript code, with our proposed method successfully detecting more than 81% for various attack types and achieving an almost twofold performance improvement on JS-Droppers compared to the sequence-based approach. In addition, we observed that the AST graph structure represented the code’s semantic meaning, exhibiting distinctive patterns and signatures that could be effectively captured using the proposed method.

[1]  Tao Ban,et al.  Understanding the Influence of AST-JS for Improving Malicious Webpage Detection , 2022, Applied Sciences.

[2]  M. Alazab,et al.  Detection of Obfuscated Malicious JavaScript Code , 2022, Future Internet.

[3]  Cheng Huang,et al.  JStrong: Malicious JavaScript detection based on code semantic representation and graph neural network , 2022, Comput. Secur..

[4]  Xiaojie Liu,et al.  JSContana: Malicious JavaScript detection using adaptable context analysis and key feature extraction , 2021, Comput. Secur..

[5]  Johannes Fürnkranz Decision Tree , 2020, Encyclopedia of Database Systems.

[6]  L. Salin,et al.  Scripting , 2020, Atlas of Digital Architecture.

[7]  Seong Oun Hwang,et al.  Lightweight Detection Method of Obfuscated Landing Sites Based on the AST Structure and Tokens , 2020, Applied Sciences.

[8]  Seiichi Ozawa,et al.  Deep Neural Networks for Malicious JavaScript Detection Using Bytecode Sequences , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[9]  Bryan Hooi,et al.  Understanding and Resolving Performance Degradation in Deep Graph Convolutional Networks , 2020, CIKM.

[10]  Yong Fang,et al.  Detecting malicious JavaScript code based on semantic analysis , 2020, Comput. Secur..

[11]  Chen Chen,et al.  Malicious JavaScript Detection Based on Bidirectional LSTM Model , 2020, Applied Sciences.

[12]  Geoffrey I. Webb Naïve Bayes , 2020, Encyclopedia of Machine Learning.

[13]  Michael Backes,et al.  JStap: a static pre-filter for malicious JavaScript detection , 2019, ACSAC.

[14]  Seiichi Ozawa,et al.  A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors , 2019, Appl. Soft Comput..

[15]  Lu Sun,et al.  JSAC: A Novel Framework to Detect Malicious JavaScript via CNNs over AST and CFG , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[16]  Hailong Sun,et al.  A Novel Neural Source Code Representation Based on Abstract Syntax Tree , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[17]  Matthew J. Hausknecht,et al.  ScriptNet: Neural Static Analysis for Malicious JavaScript Detection , 2019, MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM).

[18]  Jiabao Wang,et al.  Malware Detection Based on Opcode Sequence and ResNet , 2018, Security with Intelligent Computing and Big-data Services.

[19]  Lei Xu,et al.  Malicious JavaScript Code Detection Based on Hybrid Analysis , 2018, 2018 25th Asia-Pacific Software Engineering Conference (APSEC).

[20]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[21]  Victor R. L. Shen,et al.  Javascript Malware Detection Using A High-Level Fuzzy Petri Net , 2018, 2018 International Conference on Machine Learning and Cybernetics (ICMLC).

[22]  Michael Backes,et al.  JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript , 2018, DIMVA.

[23]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[24]  Priyadarshini Panda,et al.  Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning , 2018, Neural Networks.

[25]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[26]  Kevin Jones,et al.  Early Stage Malware Prediction Using Recurrent Neural Networks , 2017, Comput. Secur..

[27]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[28]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[29]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[30]  Claudia Eckert,et al.  Detection of Intrusions and Malware, and Vulnerability Assessment , 2016, Lecture Notes in Computer Science.

[31]  Andreas Krause,et al.  Learning programs from noisy data , 2016, POPL.

[32]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[35]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[36]  Wei-Hong Wang,et al.  A Static Malicious Javascript Detection Using SVM , 2013 .

[37]  Wei Xu,et al.  JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.

[38]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[39]  Wei Xu,et al.  The power of obfuscation techniques in malicious JavaScript code: A measurement study , 2012, 2012 7th International Conference on Malicious and Unwanted Software.

[40]  Eunjin Jung,et al.  Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[41]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[42]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[43]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[44]  Robert Valette,et al.  Fuzzy Petri Nets: An Overview , 1996 .

[45]  Claudio Moraga,et al.  The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning , 1995, IWANN.

[46]  Susan Horwitz,et al.  Identifying the semantic and textual differences between two versions of a program , 1990, PLDI '90.

[47]  K. J. Ottenstein,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[48]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[49]  Lilia Georgieva,et al.  MNN-XSS: Modular Neural Network Based Approach for XSS Attack Detection , 2022, Computers, Materials & Continua.

[50]  Tao Ban,et al.  JStrack: Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism , 2021, International Conference on Neural Information Processing.

[51]  Mamoru Mimura,et al.  Detection of malicious javascript on an imbalanced dataset , 2021, Internet Things.

[52]  T. Rajkumar,et al.  Spider bird swarm algorithm with deep belief network for malicious JavaScript detection , 2021, Comput. Secur..

[53]  Security with Intelligent Computing and Big-data Services , 2020, Advances in Intelligent Systems and Computing.

[54]  Liang Liu,et al.  Research on Malicious JavaScript Detection Technology Based on LSTM , 2018, IEEE Access.

[55]  Gerardo Canfora,et al.  Malicious JavaScript Detection by Features Extraction , 2014, e Informatica Softw. Eng. J..

[56]  W. Marsden I and J , 2012 .

[57]  Geoffrey E. Hinton Deep Belief Nets , 2010, Encyclopedia of Machine Learning.

[58]  Ben Feinstein Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript , 2007 .

[59]  Frances E. Allen,et al.  Control-flow analysis , 2022 .