A Length-aware Regular Expression SMT Solver

Motivated by program analysis, security, and verification applications, we study various fragments of a rich first-order quantifier-free (QF) theory $T_{LRE,n,c}$ over regular expression (regex) membership predicate, linear integer arithmetic over string length, string-number conversion predicate, and string concatenation. Our contributions are the following. On the theoretical side, we prove a series of (un)decidability and complexity theorems for various fragments of $T_{LRE,n,c}$, some of which have been open for several years. On the practical side, we present a novel length-aware decision procedure for the QF first-order theory $T_{LRE}$ with regex membership predicate and linear arithmetic over string length. The crucial insight that enables our algorithm to scale for instances obtained from practical applications is that these instances contain a wealth of information about upper and lower bounds on lengths of strings which can be used to simplify operations on automata representing regexes. We showcase the power of our algorithm via an extensive empirical evaluation over a large and diverse benchmark of over 57000 regex-heavy instances, derived from a mix of industrial applications, instances contributed by other solver developers, as well as randomly-generated ones. Specifically, our solver outperforms five other state-of-the-art string solvers, namely, CVC4, Z3str3, Z3-Trau, OSTRICH and Z3seq, over this benchmark.

[1]  Lionel C. Briand,et al.  An Integrated Approach for Effective Injection Vulnerability Analysis of Web Applications Through Security Slicing and Hybrid Constraint Solving , 2020, IEEE Transactions on Software Engineering.

[2]  Yan Chen,et al.  What Is Decidable about String Constraints with the ReplaceAll Function , 2017, 1711.03363.

[3]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[4]  Tevfik Bultan,et al.  Automata-Based Model Counting for String Constraints , 2015, CAV.

[5]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[6]  Nikolaj Bjørner,et al.  Symbolic Boolean derivatives for efficiently solving extended regular expression constraints , 2020, PLDI.

[7]  Klaus U. Schulz,et al.  Makanin's Algorithm for Word Equations - Two Improvements and a Generalization , 1990, IWWERT.

[8]  Dennis Komm,et al.  Adventures Between Lower Bounds and Higher Altitudes , 2018, Lecture Notes in Computer Science.

[9]  Armando Solar-Lezama,et al.  Word Equations with Length Constraints: What's Decidable? , 2012, Haifa Verification Conference.

[10]  Michael D. Ernst,et al.  HAMPI: a solver for string constraints , 2009, ISSTA.

[11]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[12]  Cesare Tinelli,et al.  A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions , 2014, CAV.

[13]  Xiangyu Zhang,et al.  Z3str2: an efficient solver for strings, regular expressions, and length constraints , 2017, Formal Methods Syst. Des..

[14]  Parosh Aziz Abdulla,et al.  Flatten and conquer: a framework for efficient analysis of string constraints , 2017, PLDI.

[15]  Galina Jirásková,et al.  A Survey on Fooling Sets as Effective Tools for Lower Bounds on Nondeterministic Complexity , 2018, Adventures Between Lower Bounds and Higher Altitudes.

[16]  C. R. Ramakrishnan,et al.  Proceedings of the Theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems , 2008 .

[17]  G. Makanin The Problem of Solvability of Equations in a Free Semigroup , 1977 .

[18]  Parosh Aziz Abdulla,et al.  Norn: An SMT Solver for String Constraints , 2015, CAV.

[19]  Koushik Sen,et al.  Jalangi: a selective record-replay and dynamic analysis framework for JavaScript , 2013, ESEC/FSE 2013.

[20]  Philipp Rümmer,et al.  String constraints with concatenation and transducers solved efficiently , 2017, Proc. ACM Program. Lang..

[21]  Fang Yu,et al.  Stranger: An Automata-Based String Analysis Tool for PHP , 2010, TACAS.

[22]  Philipp Rümmer,et al.  Decision procedures for path feasibility of string-manipulating programs with complex operations , 2018, Proc. ACM Program. Lang..

[23]  Joxan Jaffar,et al.  Progressive Reasoning over Recursively-Defined Strings , 2016, CAV.

[24]  R. Adams Proceedings , 1947 .

[25]  Florin Manea,et al.  The Power of String Solving: Simplicity of Comparison , 2020, AST@ICSE.

[26]  Xiangyu Zhang,et al.  Effective Search-Space Pruning for Solvers of String Equations, Regular Expressions and Length Constraints , 2015, CAV.

[27]  Florin Manea,et al.  The Satisfiability of Word Equations: Decidable and Undecidable Theories , 2018, RP.

[28]  Larry Joseph Stockmeyer,et al.  The complexity of decision problems in automata theory and logic , 1974 .

[29]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[30]  Wojciech Plandowski,et al.  An efficient algorithm for solving word equations , 2006, STOC '06.

[31]  Samuel Eilenberg,et al.  Automata, languages, and machines. A , 1974, Pure and applied mathematics.

[32]  Federico Mora,et al.  StringFuzz: A Fuzzer for String Solvers , 2018, CAV.

[33]  David L. Dill,et al.  Deciding Presburger Arithmetic by Model Checking and Comparisons with Other Methods , 2002, FMCAD.

[34]  Karen Bradshaw,et al.  Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference , 2013 .

[35]  Dexter Kozen,et al.  Lower bounds for natural proof systems , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[36]  Yunhui Zheng,et al.  ZSstrS: A string solver with theory-aware heuristics , 2017, 2017 Formal Methods in Computer Aided Design (FMCAD).

[37]  Parosh Aziz Abdulla,et al.  Efficient handling of string-number conversion , 2020, PLDI.

[38]  Willem Visser,et al.  Symbolic execution of programs with strings , 2012, SAICSIT '12.

[39]  Parosh Aziz Abdulla,et al.  String Constraints for Verification , 2014, CAV.

[40]  Cesare Tinelli,et al.  A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings , 2015, FroCos.

[41]  Marek Chrobak,et al.  Finite Automata and Unary Languages , 1986, Theor. Comput. Sci..

[42]  Pawel Gawrychowski Chrobak Normal Form Revisited, with Applications , 2011, CIAA.

[43]  Cole Schlesinger,et al.  One-Click Formal Methods , 2019, IEEE Software.