Fast and Precise Sanitizer Analysis with BEK

Web applications often use special string-manipulating sanitizers on untrusted user data, but it is difficult to reason manually about the behavior of these functions, leading to errors. For example, the Internet Explorer cross-site scripting filter turned out to transform some web pages without JavaScript into web pages with valid Java-Script, enabling attacks. In other cases, sanitizers may fail to commute, rendering one order of application safe and the other dangerous. BEK is a language and system for writing sanitizers that enables precise analysis of sanitizer behavior, including checking idempotence, commutativity, and equivalence. For example, BEK can determine if a target string, such as an entry on the XSS Cheat Sheet, is a valid output of a sanitizer. If so, our analysis synthesizes an input string that yields that target. Our language is expressive enough to capture real web sanitizers used in ASP.NET, the Internet Explorer XSS Filter, and the Google AutoEscape framework, which we demonstrate by porting these sanitizers to BEK. Our analyses use a novel symbolic finite automata representation to leverage fast satisfiability modulo theories (SMT) solvers and are quick in practice, taking fewer than two seconds to check the commutativity of the entire set of Internet Exporer XSS filters, between 36 and 39 seconds to check implementations of HTMLEncode against target strings from the XSS Cheat Sheet, and less than ten seconds to check equivalence between all pairs of a set of implementations of HTMLEncode. Programs written in BEK can be compiled to traditional languages such as JavaScript and C#, making it possible for web developers to write sanitizers supported by deep analysis, yet deploy the analyzed code directly to real applications.

[1]  David J. Goodman,et al.  Personal Communications , 1994, Mobile Communications.

[2]  Grzegorz Rozenberg,et al.  Handbook of formal languages, vol. 1: word, language, grammar , 1997 .

[3]  Gertjan van Noord,et al.  Finite State Transducers with Predicates and Identities , 2001, Grammars.

[4]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.

[5]  Alan J. Demers,et al.  On some decidable properties of finite state translations , 2004, Acta Informatica.

[6]  Yasuhiko Minamide Static approximation of dynamically generated Web pages , 2005, WWW '05.

[7]  Benjamin Livshits,et al.  Finding Security Vulnerabilities in Java Applications with Static Analysis , 2005, USENIX Security Symposium.

[8]  Anh Nguyen-Tuong,et al.  Automatically Hardening Web Applications Using Precise Tainting , 2005, SEC.

[9]  Christopher Krügel,et al.  Pixy: a static analysis tool for detecting Web application vulnerabilities , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[10]  Alexander Aiken,et al.  Static Detection of Security Vulnerabilities in Scripting Languages , 2006, USENIX Security Symposium.

[11]  Chen-Nee Chuah,et al.  FIREMAN: a toolkit for firewall modeling and analysis , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[12]  Z. Su,et al.  Sound and precise analysis of web applications for injection vulnerabilities , 2007, PLDI '07.

[13]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[14]  Christopher Krügel,et al.  Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[15]  Hiroshi Inamura,et al.  Dynamic test input generation for web applications , 2008, ISSTA '08.

[16]  Michael D. Ernst,et al.  HAMPI: a solver for string constraints , 2009, ISSTA.

[17]  Westley Weimer,et al.  A decision procedure for subset constraints over regular languages , 2009, PLDI '09.

[18]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[19]  Benjamin Livshits,et al.  Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[20]  Pieter Hooimeijer,et al.  Decision Procedures for String Constraints , 2010 .

[21]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[22]  Benjamin Livshits,et al.  SCRIPTGARD: Preventing Script Injection Attacks in Legacy Web Applications with Automatic Sanitization , 2010 .

[23]  Dawn Xiaodong Song,et al.  Inference and analysis of formal models of botnet command and control protocols , 2010, CCS '10.

[24]  Margus Veanes,et al.  Rex: Symbolic Regular Expression Explorer , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[25]  Nikolaj Bjørner,et al.  Symbolic Automata Constraint Solving , 2010, LPAR.

[26]  Westley Weimer,et al.  Solving string constraints lazily , 2010, ASE.

[27]  Naoki Kobayashi,et al.  Higher-order multi-parameter tree transducers and recursion schemes for program verification , 2010, POPL '10.

[28]  Collin Jackson,et al.  Regular expressions considered harmful in client-side XSS filters , 2010, WWW '10.

[29]  Frank Tip,et al.  Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking , 2010, IEEE Transactions on Software Engineering.

[30]  Pavol Cerný,et al.  Streaming transducers for algorithmic verification of single-pass list-processing programs , 2010, POPL '11.

[31]  Nivat G. Päun,et al.  Handbook of Formal Languages , 2013, Springer Berlin Heidelberg.