Simple linear string constraints

Modern web applications often suffer from command injection attacks. Even when equipped with sanitization code, many systems can be penetrated due to software bugs. It is desirable to automatically discover such vulnerabilities, given the bytecode of a web application. One approach would be symbolically executing the target system and constructing constraints for matching path conditions and attack patterns. Solving these constraints yields an attack signature, based on which, the attack process can be replayed. Constraint solving is the key to symbolic execution. For web applications, string constraints receive most of the attention because web applications are essentially text processing programs. We present simple linear string equation (SISE), a decidable fragment of the general string constraint system. SISE models a collection of regular replacement operations (such as the greedy, reluctant, declarative, and finite replacement), which are frequently used by text processing programs. Various automata techniques are proposed for simulating procedural semantics such as left-most matching. By composing atomic transducers of a SISE, we show that a recursive algorithm can be used to compute the solution pool, which contains the value range of each variable in concrete solutions. Then a concrete variable solution can be synthesized from a solution pool. To accelerate solver performance, a symbolic representation of finite state transducer is developed. This allows the constraint solver to support a 16-bit Unicode alphabet in practice. The algorithm is implemented in a Java constraint solver called SUSHI. We compare the applicability and performance of SUSHI with Kaluza, a bounded string solver.

[1]  M. Lothaire,et al.  Algebraic Combinatorics on Words: Index of Notation , 2002 .

[2]  Westley Weimer,et al.  Solving string constraints lazily , 2010, ASE.

[3]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[4]  Xiang Fu,et al.  A Static Analysis Framework For Detecting SQL Injection Vulnerabilities , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[5]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[6]  M. Lothaire Algebraic Combinatorics on Words , 2002 .

[7]  Chris Anley,et al.  Advanced SQL Injection In SQL Server Applications , 2002 .

[8]  Oscar H. Ibarra,et al.  Symbolic String Verification: An Automata-Based Approach , 2008, SPIN.

[9]  Alessandro Orso,et al.  AMNESIA: analysis and monitoring for NEutralizing SQL-injection attacks , 2005, ASE.

[10]  Richard S. Varga,et al.  Proof of Theorem 5 , 1983 .

[11]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[12]  Grzegorz Rozenberg,et al.  Word, language, grammar , 1997 .

[13]  Avik Chaudhuri,et al.  Symbolic security analysis of ruby-on-rails web applications , 2010, CCS '10.

[14]  Fang Yu,et al.  Stranger: An Automata-Based String Analysis Tool for PHP , 2010, TACAS.

[15]  Premkumar T. Devanbu,et al.  JDBC checker: a static analysis tool for SQL/JDBC applications , 2004, Proceedings. 26th International Conference on Software Engineering.

[16]  Anh Nguyen-Tuong,et al.  Automatically Hardening Web Applications Using Precise Tainting , 2005, SEC.

[17]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[18]  Xiang Fu,et al.  SAFELI: SQL injection scanner using symbolic execution , 2008, TAV-WEB '08.

[19]  Michael D. Ernst,et al.  Automatic creation of SQL Injection and cross-site scripting attacks , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[20]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[21]  Westley Weimer,et al.  A decision procedure for subset constraints over regular languages , 2009, PLDI '09.

[22]  Grzegorz Rozenberg,et al.  Handbook of Formal Languages , 1997, Springer Berlin Heidelberg.

[23]  Aske Simon Christensen,et al.  Extending Java for high-level Web service construction , 2002, TOPL.

[24]  Vitaly Osipov,et al.  Format String Attacks , 2005 .

[25]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[26]  Shih-Kun Huang,et al.  Web application security assessment by fault injection and behavior monitoring , 2003, WWW '03.

[27]  G. Makanin The Problem of Solvability of Equations in a Free Semigroup , 1977 .

[28]  Grzegorz Rozenberg,et al.  Handbook of formal languages, vol. 1: word, language, grammar , 1997 .

[29]  Nikolaj Bjørner,et al.  Symbolic Automata Constraint Solving , 2010, LPAR.

[30]  Xiang Fu,et al.  A String Constraint Solver for Detecting Web Application Vulnerability , 2010, SEKE.

[31]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[32]  Xiang Fu,et al.  APOGEE: automated project grading and instant feedback system for web based computing , 2008, SIGCSE '08.

[33]  Mark-Jan Nederhof,et al.  Regular Approximation of Context-Free Grammars through Transformation , 2001 .

[34]  Corina S. Pasareanu,et al.  JPF-SE: A Symbolic Execution Extension to Java PathFinder , 2007, TACAS.

[35]  W. Visser,et al.  Second Generation of a Java Model Checker , 2000 .

[36]  Fritz Henglein,et al.  Regular expression containment: coinductive axiomatization and computational interpretation , 2011, POPL '11.

[37]  Angelos D. Keromytis,et al.  SQLrand: Preventing SQL Injection Attacks , 2004, ACNS.

[38]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[39]  Xiang Fu,et al.  Modeling Regular Replacement for String Constraint Solving , 2010, NASA Formal Methods.

[40]  Benjamin Livshits,et al.  Fast and Precise Sanitizer Analysis with BEK , 2011, USENIX Security Symposium.

[41]  Margus Veanes,et al.  An Evaluation of Automata Algorithms for String Analysis , 2011, VMCAI.

[42]  David Notkin,et al.  Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution , 2005, TACAS.

[43]  Stephen McCamant,et al.  Input generation via decomposition and re-stitching: finding bugs in Malware , 2010, CCS '10.

[44]  Michael D. Ernst,et al.  HAMPI: a solver for string constraints , 2009, ISSTA.

[45]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.

[46]  Steve Alten,et al.  Omega Project , 1978, Encyclopedia of Parallel Computing.

[47]  Gregory Grefenstette,et al.  Regular expressions for language engineering , 1996, Natural Language Engineering.

[48]  Christian Kirkegaard,et al.  Static Analysis for Java Servlets and JSP , 2006, SAS.

[49]  J. Richard Büchi,et al.  Definability in the Existential Theory of Concatenation and Undecidable Extensions of this Theory , 1988, Math. Log. Q..

[50]  Fang Yu,et al.  Generating Vulnerability Signatures for String Manipulating Programs Using Automata-Based Forward and Backward Symbolic Analyses , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[51]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[52]  Oscar H. Ibarra,et al.  Symbolic String Verification: Combining String Analysis and Size Analysis , 2009, TACAS.

[53]  Pavol Cerný,et al.  Streaming transducers for algorithmic verification of single-pass list-processing programs , 2010, POPL '11.

[54]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[55]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.