Finding your way in the testing jungle: a learning approach to web security testing

Black-box security testing of web applications is a hard problem. The main complication lies in the black-box assumption: The testing tool has limited insight into the workings of server-side defenses. This has traditionally led commercial as well as research vulnerability scanners toward heuristic approaches, such as testing each input point (e.g. HTTP parameter) with a short, predefined list of effective test payloads to balance between coverage and performance. We take a fresh approach to the problem of security testing, casting it into a learning setting. In our approach, the testing algorithm has available a comprehensive database of test payloads, such that if the web application's defenses are broken, then with near certainty one of the candidate payloads is able to demonstrate the vulnerability. The question then becomes how to efficiently search through the payload space to find a good candidate. In our solution, the learning algorithm infers from a failed test---by analyzing the website's response---which other payloads are also likely to fail, thereby pruning substantial portions of the search space. We have realized our approach in XSS Analyzer, an industry-level cross-site scripting (XSS) scanner featuring 500,000,000 test payloads. Our evaluation on 15,552 benchmarks shows solid results: XSS Analyzer achieves >99% coverage relative to brute-force traversal over all payloads, while trying only 10 payloads on average per input point. XSS Analyzer also outperforms several competing algorithms, including a mature commercial algorithm---featured in IBM Security AppScan Standard V8.5---by a far margin. XSS Analyzer has recently been integrated into the latest version of AppScan (V8.6) instead of that algorithm.

[1]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[2]  Christopher Krügel,et al.  Enemy of the State: A State-Aware Black-Box Web Vulnerability Scanner , 2012, USENIX Security Symposium.

[3]  Manu Sridharan,et al.  TAJ: effective taint analysis of web applications , 2009, PLDI '09.

[4]  Christopher Krügel,et al.  SecuBat: a web vulnerability scanner , 2006, WWW '06.

[5]  John C. Mitchell,et al.  State of the Art: Automated Black-Box Web Application Vulnerability Testing , 2010, 2010 IEEE Symposium on Security and Privacy.

[6]  Monica S. Lam,et al.  Automatic Generation of XSS and SQL Injection Attacks with Goal-Directed Model Checking , 2008, USENIX Security Symposium.

[7]  Christopher Krügel,et al.  Leveraging User Interactions for In-Depth Testing of Web Applications , 2008, RAID.

[8]  Patrick Cousot,et al.  Andromeda: Accurate and Scalable Security Analysis of Web Applications , 2013, FASE.

[9]  Benjamin Livshits,et al.  Finding Security Vulnerabilities in Java Applications with Static Analysis , 2005, USENIX Security Symposium.

[10]  Michael D. Ernst,et al.  Automatic creation of SQL Injection and cross-site scripting attacks , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[11]  Christopher Krügel,et al.  Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  Hahn-Ming Lee,et al.  Structural Learning of Attack Vectors for Generating Mutated XSS Attacks , 2010, TAV-WEB.

[13]  Christopher Krügel,et al.  Cross Site Scripting Prevention with Dynamic Data Tainting and Static Analysis , 2007, NDSS.

[14]  Marco Pistoia,et al.  Saving the world wide web from vulnerable JavaScript , 2011, ISSTA '11.

[15]  Dawson R. Engler,et al.  Using programmer-written compiler extensions to catch security holes , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[16]  Marco Pistoia,et al.  Path- and index-sensitive string analysis based on monadic second-order logic , 2011, ISSTA '11.

[17]  Z. Zabinsky Random Search Algorithms , 2010 .

[18]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .