What Is Decidable about String Constraints with the ReplaceAll Function

The theory of strings with concatenation has been widely argued as the basis of constraint solving for verifying string-manipulating programs. However, this theory is far from adequate for expressing many string constraints that are also needed in practice; for example, the use of regular constraints (pattern matching against a regular expression), and the string-replace function (replacing either the first occurrence or all occurrences of a ``pattern'' string constant/variable/regular expression by a ``replacement'' string constant/variable), among many others. Both regular constraints and the string-replace function are crucial for such applications as analysis of JavaScript (or more generally HTML5 applications) against cross-site scripting (XSS) vulnerabilities, which motivates us to consider a richer class of string constraints. The importance of the string-replace function (especially the replace-all facility) is increasingly recognised, which can be witnessed by the incorporation of the function in the input languages of several string constraint solvers. Recently, it was shown that any theory of strings containing the string-replace function (even the most restricted version where pattern/replacement strings are both constant strings) becomes undecidable if we do not impose some kind of straight-line (aka acyclicity) restriction on the formulas. Despite this, the straight-line restriction is still practically sensible since this condition is typically met by string constraints that are generated by symbolic execution. In this paper, we provide the first systematic study of straight-line string constraints with the string-replace function and the regular constraints as the basic operations. We show that a large class of such constraints (i.e. when only a constant string or a regular expression is permitted in the pattern) is decidable. We note that the string-replace function, even under this restriction, is sufficiently powerful for expressing the concatenation operator and much more (e.g. extensions of regular expressions with string variables). This gives us the most expressive decidable logic containing concatenation, replace, and regular constraints under the same umbrella. Our decision procedure for the straight-line fragment follows an automata-theoretic approach, and is modular in the sense that the string-replace terms are removed one by one to generate more and more regular constraints, which can then be discharged by the state-of-the-art string constraint solvers. We also show that this fragment is, in a way, a maximal decidable subclass of the straight-line fragment with string-replace and regular constraints. To this end, we show undecidability results for the following two extensions: (1) variables are permitted in the pattern parameter of the replace function, (2) length constraints are permitted.

[1]  Koushik Sen,et al.  Symbolic Execution , 2012, RV.

[2]  Joxan Jaffar,et al.  Progressive Reasoning over Recursively-Defined Strings , 2016, CAV.

[3]  Artur Jez Word equations in linear space , 2017, ArXiv.

[4]  Armando Solar-Lezama,et al.  Word Equations with Length Constraints: What's Decidable? , 2012, Haifa Verification Conference.

[5]  Benjamin Livshits,et al.  Fast and Precise Sanitizer Analysis with BEK , 2011, USENIX Security Symposium.

[6]  Thomas A. Henzinger,et al.  Array Folds Logic , 2016, CAV.

[7]  Xiangyu Zhang,et al.  Z3-str: a z3-based string solver for web application analysis , 2013, ESEC/FSE 2013.

[8]  Christel Baier,et al.  Principles of Model Checking (Representation and Mind Series) , 2008 .

[9]  Joxan Jaffar,et al.  S3: A Symbolic String Solver for Vulnerability Detection in Web Applications , 2014, CCS.

[10]  Loris D'Antoni,et al.  Static Analysis of String Encoders and Decoders , 2013, VMCAI.

[11]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[12]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[13]  R. BurchJ.,et al.  Symbolic model checking , 1992 .

[14]  Parosh Aziz Abdulla,et al.  String Constraints for Verification , 2014, CAV.

[15]  Sérgio Vale Aguiar Campos,et al.  Symbolic Model Checking , 1993, CAV.

[16]  J. Richard Büchi,et al.  Definability in the Existential Theory of Concatenation and Undecidable Extensions of this Theory , 1988, Math. Log. Q..

[17]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[18]  Christoph Kern,et al.  Securing the tangled web , 2014, Commun. ACM.

[19]  Christopher Krügel,et al.  Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[20]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[21]  Wojciech Plandowski Satisfiability of word equations with constants is in PSPACE , 2004, JACM.

[22]  G. Makanin The Problem of Solvability of Equations in a Free Semigroup , 1977 .

[23]  Erik Massop Hilbert's tenth problem , 2012 .

[24]  Cesare Tinelli,et al.  A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions , 2014, CAV.

[25]  Parosh Aziz Abdulla,et al.  Flatten and conquer: a framework for efficient analysis of string constraints , 2017, PLDI.

[26]  Oscar H. Ibarra,et al.  On two-way FA with monotonic counters and quadratic Diophantine equations , 2004, Theor. Comput. Sci..

[27]  Taolue Chen,et al.  What's Decidable About String Constraints with ReplaceAll Function? , 2018, POPL 2018.

[28]  Pavol Cerný,et al.  Expressiveness of streaming string transducers , 2010, FSTTCS.

[29]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[30]  Anthony Widjaja Lin,et al.  String solving with word equations and transducers: towards a logic for analysing mutation XSS , 2015, POPL.

[31]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[32]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[33]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[34]  Michael D. Ernst,et al.  HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars , 2012, TSEM.

[35]  Jie-Hong Roland Jiang,et al.  String Analysis via Automata Manipulation with Logic Circuit Representation , 2016, CAV.

[36]  Klaus U. Schulz,et al.  Makanin's Algorithm for Word Equations - Two Improvements and a Generalization , 1990, IWWERT.

[37]  Koushik Sen,et al.  Jalangi: a selective record-replay and dynamic analysis framework for JavaScript , 2013, ESEC/FSE 2013.

[38]  Oscar H. Ibarra,et al.  Automata-based symbolic string analysis for vulnerability detection , 2014, Formal Methods Syst. Des..