Research on Malicious JavaScript Detection Technology Based on LSTM

The attacker injects malicious JavaScript into web pages to achieve the purpose of implanting Trojan horses, spreading viruses, phishing, and obtaining secret information. By analyzing the existing researches on malicious JavaScript detection, a malicious JavaScript detection model based on LSTM (Long Short-Term Memory) is proposed. Features are extracted from the semantic level of bytecode, and the method of word vector is optimized. It can distinguish malicious JavaScript code and combat obfuscated code effectively. Experiments showed that the accuracy of detection model based on LSTM is 99.51%, and the F1-score is 98.37%, which is better than the existing model based on Random Forest and SVM algorithm.

[1]  Helen J. Wang,et al.  On the Incoherencies in Web Browser Access Control Policies , 2010, 2010 IEEE Symposium on Security and Privacy.

[2]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[3]  Vinod Yegneswaran,et al.  EKHunter: A Counter-Offensive Toolkit for Exploit Kit Infiltration , 2015, NDSS.

[4]  Zhang Xiao-song Technique of detecting malicious executables via behavioral and binary signatures , 2011 .

[5]  Eunjin Jung,et al.  Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Yang Fa Malware Detection Based on Graph Edit Distance , 2013 .

[8]  Roberto Perdisci,et al.  WebWitness: Investigating, Categorizing, and Mitigating Malware Download Paths , 2015, USENIX Security Symposium.

[9]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[10]  Jonghyun Kim,et al.  Improvement of malware detection and classification using API call sequence alignment and visualization , 2017, Cluster Computing.

[11]  Zhou Lia Exploration of the Working Principle and Application of Word2vec , 2015 .

[12]  Christopher Krügel,et al.  Meerkat: Detecting Website Defacements through Image-based Object Recognition , 2015, USENIX Security Symposium.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).