An SMT-LIB Format for Sequences and Regular Expressions

Strings are ubiquitous in software. Tools for verication and testing of software rely in various degrees on reasoning about strings. Web applications are particularly important in this context since they tend to be string-heavy and have large number security errors attributable to improper string sanitzation and manipulations. In recent years, many string solvers have been implemented to address the analysis needs of verication, testing and security tools aimed at string-heavy applications. These solvers support a basic representation of strings, functions such as concatenation, extraction, and predicates such as equality and membership in regular expressions. However, the syntax and semantics supported by the current crop of string solvers are mutually incompatible. Hence, there is an acute need for a standardized theory of strings (i.e., SMT-LIBization of a theory of strings) that supports a core set of functions, predicates and string representations. This paper presents a proposal for exactly such a standardization eort, i.e., an SMTLIBization of strings and regular expressions. It introduces a theory of sequences generalizing strings, and builds a theory of regular expressions on top of sequences. The proposed logic QF BVRE is designed to capture a common substrate among existing tools for string constraint solving.

[1]  Thierry Coquand,et al.  The Calculus of Constructions , 1988, Inf. Comput..

[2]  Nils Klarlund,et al.  MONA: Monadic Second-Order Logic in Practice , 1995 .

[3]  Gertjan van Noord,et al.  Finite State Transducers with Predicates and Identities , 2001, Grammars.

[4]  Pierre Wolper,et al.  Representing Arithmetic Constraints with Finite Automata: An Overview , 2002, ICLP.

[5]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[6]  Nils Klarlund,et al.  MONA Implementation Secrets , 2000, Int. J. Found. Comput. Sci..

[7]  Lucian Ilie,et al.  Follow automata , 2003, Inf. Comput..

[8]  Simona Orzan,et al.  Distributed State Space Minimization , 2003, Electron. Notes Theor. Comput. Sci..

[9]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.

[10]  Simona Orzan,et al.  Distributed state space minimization , 2004, International Journal on Software Tools for Technology Transfer.

[11]  Sebastian Bala Regular Language Matching and Other Decidable Cases of the Satisfiability Problem for Constraints between Regular Open Terms , 2004, STACS.

[12]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[13]  Zhendong Su,et al.  Sound and precise analysis of web applications for injection vulnerabilities , 2007, PLDI '07.

[14]  Michal Kunc,et al.  What Do We Know About Language Equations? , 2007, Developments in Language Theory.

[15]  Michael D. Ernst,et al.  HAMPI: a solver for string constraints , 2009, ISSTA.

[16]  Nikolai Tillmann,et al.  Reggae: Automated Test Generation for Programs Using Complex Regular Expressions , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[17]  Westley Weimer,et al.  A decision procedure for subset constraints over regular languages , 2009, PLDI '09.

[18]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[19]  Oscar H. Ibarra,et al.  Symbolic String Verification: Combining String Analysis and Size Analysis , 2009, TACAS.

[20]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[21]  Fang Yu,et al.  Stranger: An Automata-Based String Analysis Tool for PHP , 2010, TACAS.

[22]  Margus Veanes,et al.  Rex: Symbolic Regular Expression Explorer , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[23]  Nikolaj Bjørner,et al.  Symbolic Automata Constraint Solving , 2010, LPAR.

[24]  Westley Weimer,et al.  Solving string constraints lazily , 2010, ASE.

[25]  Helmut Veith,et al.  Decision Procedures in Soft, Hard and Bio-ware - Follow Up (Dagstuhl Seminar 11272) , 2011, Dagstuhl Reports.

[26]  Michael D. Ernst,et al.  HAMPI: A String Solver for Testing, Analysis and Vulnerability Detection , 2011, CAV.

[27]  Margus Veanes,et al.  An Evaluation of Automata Algorithms for String Analysis , 2011, VMCAI.

[28]  Nikolaj Bjørner,et al.  SMT-LIB Sequences and Regular Expressions , 2012, SMT@IJCAR.

[29]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[30]  Nikolaj Bjørner,et al.  Symbolic Automata: The Toolkit , 2012, TACAS.