String constraints with concatenation and transducers solved efficiently

String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature.

[1]  Alain Finkel,et al.  A Generalization of the Procedure of Karp and Miller to Well Structured Transition Systems , 1987, ICALP.

[2]  Christophe Morvan,et al.  On Rational Graphs , 2000, FoSSaCS.

[3]  Anthony Widjaja Lin,et al.  Expressive Languages for Path Queries over Graph-Structured Data , 2012, TODS.

[4]  Dawn Xiaodong Song,et al.  A Systematic Analysis of XSS Sanitization in Web Application Frameworks , 2011, ESORICS.

[5]  Cesare Tinelli,et al.  An efficient SMT solver for string constraints , 2016, Formal Methods Syst. Des..

[6]  Moshe Y. Vardi,et al.  Experimental Evaluation of Classical Automata Constructions , 2005, LPAR.

[7]  Nils Klarlund,et al.  MONA Implementation Secrets , 2000, Int. J. Found. Comput. Sci..

[8]  Elena Sherman,et al.  Evaluation of string constraint solvers in the context of symbolic execution , 2014, ASE.

[9]  Xiang Fu,et al.  Simple linear string constraints , 2013, Formal Aspects of Computing.

[10]  Oscar H. Ibarra,et al.  Symbolic String Verification: Combining String Analysis and Size Analysis , 2009, TACAS.

[11]  Armando Solar-Lezama,et al.  Word Equations with Length Constraints: What's Decidable? , 2012, Haifa Verification Conference.

[12]  Anthony Widjaja Lin,et al.  String solving with word equations and transducers: towards a logic for analysing mutation XSS , 2015, POPL.

[13]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[14]  Loris D'Antoni,et al.  Static Analysis of String Encoders and Decoders , 2013, VMCAI.

[15]  Moshe Y. Vardi An Automata-Theoretic Approach to Linear Temporal Logic , 1996, Banff Higher Order Workshop.

[16]  Daniel Kroening,et al.  Decision Procedures , 2016, Texts in Theoretical Computer Science. An EATCS Series.

[17]  Artur Jez,et al.  Recompression: a simple and powerful technique for word equations , 2012, STACS.

[18]  Claudio Gutiérrez,et al.  Solving Equations in Strings: On Makanin's Algorithm , 1998, LATIN.

[19]  Joxan Jaffar,et al.  Progressive Reasoning over Recursively-Defined Strings , 2016, CAV.

[20]  Johannes Kinder,et al.  ExpoSE: practical symbolic execution of standalone JavaScript , 2017, SPIN.

[21]  Jorge A. Navas,et al.  Unbounded Model-Checking with Interpolation for Regular Language Constraints , 2013, TACAS.

[22]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[23]  Kenneth L. McMillan,et al.  Interpolation and SAT-Based Model Checking , 2003, CAV.

[24]  Robert K. Brayton,et al.  ABC: An Academic Industrial-Strength Verification Tool , 2010, CAV.

[25]  G. Makanin The Problem of Solvability of Equations in a Free Semigroup , 1977 .

[26]  Clark W. Barrett,et al.  The SMT-LIB Standard Version 2.0 , 2010 .

[27]  Christoph Kern,et al.  Securing the tangled web , 2014, Commun. ACM.

[28]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[29]  Michael D. Ernst,et al.  HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars , 2012, TSEM.

[30]  John McCarthy,et al.  Circumscription - A Form of Non-Monotonic Reasoning , 1980, Artif. Intell..

[31]  Hiroshi Inamura,et al.  Dynamic test input generation for web applications , 2008, ISSTA '08.

[32]  Stephan Merz,et al.  Model Checking , 2000 .

[33]  Aaron R. Bradley Understanding IC3 , 2012, SAT.

[34]  Xiang Fu,et al.  Modeling Regular Replacement for String Constraint Solving , 2010, NASA Formal Methods.

[35]  Nikolaj Bjørner,et al.  Satisfiability modulo theories , 2011, Commun. ACM.

[36]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[37]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[38]  Oscar H. Ibarra,et al.  Relational String Verification Using Multi-Track Automata , 2011, Int. J. Found. Comput. Sci..

[39]  Joxan Jaffar,et al.  S3: A Symbolic String Solver for Vulnerability Detection in Web Applications , 2014, CCS.

[40]  S. Ginsburg,et al.  Semigroups, Presburger formulas, and languages. , 1966 .

[41]  Fang Yu,et al.  Stranger: An Automata-Based String Analysis Tool for PHP , 2010, TACAS.

[42]  Jie-Hong Roland Jiang,et al.  String Analysis via Automata Manipulation with Logic Circuit Representation , 2016, CAV.

[43]  Loris D'Antoni,et al.  A Symbolic Decision Procedure for Symbolic Alternating Finite Automata , 2016, MFPS.

[44]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[45]  Sarfraz Khurshid,et al.  Symbolic execution for software testing in practice: preliminary assessment , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[46]  Benjamin Livshits,et al.  Fast and Precise Sanitizer Analysis with BEK , 2011, USENIX Security Symposium.

[47]  Willem Visser,et al.  Symbolic execution of programs with strings , 2012, SAICSIT '12.

[48]  Wojciech Plandowski,et al.  An efficient algorithm for solving word equations , 2006, STOC '06.

[49]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[50]  Wojciech Plandowski Satisfiability of word equations with constants is in PSPACE , 2004, JACM.

[51]  Ruzica Piskac,et al.  Incremental, Inductive Coverability , 2013, CAV.

[52]  Jean-François Raskin,et al.  Antichain Algorithms for Finite Automata , 2010, TACAS.

[53]  Cesare Tinelli,et al.  Abstract DPLL and Abstract DPLL Modulo Theories , 2005, LPAR.

[54]  Christopher Krügel,et al.  Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[55]  Jörg Schwenk,et al.  mXSS attacks: attacking well-secured web-applications by using innerHTML mutations , 2013, CCS.

[56]  Jacques Sakarovitch,et al.  Elements of Automata Theory , 2009 .

[57]  Philipp Rümmer,et al.  A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic , 2008, LPAR.

[58]  Marco Roveri,et al.  The nuXmv Symbolic Model Checker , 2014, CAV.

[59]  M. Lothaire Makanin's Algorithm , 2002 .

[60]  Oscar H. Ibarra,et al.  Automata-based symbolic string analysis for vulnerability detection , 2014, Formal Methods Syst. Des..

[61]  Westley Weimer,et al.  StrSolve: solving string constraints lazily , 2012, Automated Software Engineering.

[62]  Carlos A. Hurtado,et al.  Edinburgh Research Explorer Expressive Languages for Path Queries over Graph-Structured Data , 2012 .

[63]  Aarti Gupta,et al.  Unbounded Model Checking , 2007 .

[64]  Cesare Tinelli,et al.  Efficient solving of string constraints for security analysis , 2016, HotSoS.

[65]  Xiangyu Zhang,et al.  Z3-str: a z3-based string solver for web application analysis , 2013, ESEC/FSE 2013.

[66]  Daniel Le Berre,et al.  The Sat4j library, release 2.2 , 2010, J. Satisf. Boolean Model. Comput..

[67]  Mary Sheeran,et al.  Checking Safety Properties Using Induction and a SAT-Solver , 2000, FMCAD.

[68]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[69]  Cesare Tinelli,et al.  A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions , 2014, CAV.

[70]  Koushik Sen,et al.  Jalangi: a selective record-replay and dynamic analysis framework for JavaScript , 2013, ESEC/FSE 2013.

[71]  Parosh Aziz Abdulla,et al.  String Constraints for Verification , 2014, CAV.

[72]  Arlen Cox,et al.  Model Checking Regular Language Constraints , 2017, ArXiv.

[73]  Cesare Tinelli,et al.  A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings , 2015, FroCos.