JSAC: A Novel Framework to Detect Malicious JavaScript via CNNs over AST and CFG

JavaScript (JS) is a dominant programming language in web/mobile development, while it is also notoriously abused by attackers due to its powerful characteristics, e.g., dynamic, prototype-based and multi-paradigm, which foil most static and dynamic analysis approaches. To detect malicious JS instances, several machine learning-based methods have been developed recently. However, these methods took JS as a natural language instead of a programming one, which can not capture its syntactic and semantic features.In this paper, we present JSAC, a novel framework to detect JS malware. It combines deep learning and program analysis techniques to capture the syntactic and semantic features of JS programs. Specifically, to get a JS program’s syntactic information, we build its abstract syntax tree and employ a tree-based convolutional neural network (CNN) to extract features from it. To get its semantic information, we construct its control flow graph and feed it to another graph-based CNN. Last, the features extracted from two CNNs are fused for final detection. Evaluation on a corpus of 69,523 JS files indicates that JSAC outperforms 4 other models with 98.73% F1-score in detecting JS malware.

