String solving with word equations and transducers: towards a logic for analysing mutation XSS

We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the decidability of their combined theory, which is especially relevant when analysing security vulnerabilities of dynamic web pages in a more realistic browser model. On the one hand, word equations (string logic with concatenations) cannot precisely capture sanitisation functions (e.g. htmlescape) and implicit browser transductions (e.g. innerHTML mutations). On the other hand, transducers suffer from the reverse problem of being able to model sanitisation functions and browser transductions, but not string concatenations. Naively combining word equations and transducers easily leads to an undecidable logic. Our main contribution is to show that the "straight-line fragment" of the logic is decidable (complexity ranges from PSPACE to EXPSPACE). The fragment can express the program logics of straight-line string-manipulating programs with concatenations and transductions as atomic operations, which arise when performing bounded model checking or dynamic symbolic executions. We demonstrate that the logic can naturally express constraints required for analysing mutation XSS in web applications. Finally, the logic remains decidable in the presence of length, letter-counting, regular, indexOf, and disequality constraints.

[1]  Anthony Widjaja Lin,et al.  Parikh Images of Grammars: Complexity and Applications , 2010, 2010 25th Annual IEEE Symposium on Logic in Computer Science.

[2]  Margus Veanes,et al.  An Evaluation of Automata Algorithms for String Analysis , 2011, VMCAI.

[3]  Christoph Kern Securing the Tangled Web , 2014 .

[4]  Daniel Kroening,et al.  Hardware and Software, Verification and Testing , 2005, Lecture Notes in Computer Science.

[5]  Achim Blumensath,et al.  Finite Presentations of Infinite Structures: Automata and Interpretations , 2004, Theory of Computing Systems.

[6]  Zhendong Su,et al.  Static detection of cross-site scripting vulnerabilities , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[7]  Fang Yu,et al.  Stranger: An Automata-Based String Analysis Tool for PHP , 2010, TACAS.

[8]  Zhendong Su,et al.  Sound and precise analysis of web applications for injection vulnerabilities , 2007, PLDI '07.

[9]  Nikolaj Bjørner,et al.  Satisfiability modulo theories , 2011, Commun. ACM.

[10]  Dawn Xiaodong Song,et al.  A Systematic Analysis of XSS Sanitization in Web Application Frameworks , 2011, ESORICS.

[11]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[12]  Xiang Fu,et al.  Simple linear string constraints , 2013, Formal Aspects of Computing.

[13]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[14]  G. Makanin The Problem of Solvability of Equations in a Free Semigroup , 1977 .

[15]  Yuri Gurevich,et al.  The Classical Decision Problem , 1997, Perspectives in Mathematical Logic.

[16]  WassermannGary,et al.  Static checking of dynamically generated queries in database applications , 2007 .

[17]  Cesare Tinelli,et al.  Satisfiability Modulo Theories , 2021, Handbook of Satisfiability.

[18]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[19]  Kenneth L. McMillan,et al.  Symbolic model checking , 1992 .

[20]  Oscar H. Ibarra,et al.  Relational String Verification Using Multi-Track Automata , 2011, Int. J. Found. Comput. Sci..

[21]  A. To Model Checking Infinite-State Systems: Generic and Specific Approaches , 2010 .

[22]  Andrei Voronkov,et al.  Translating Regular Expression Matching into Transducers , 2010, SYNASC.

[23]  Christopher Krügel,et al.  Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[24]  Wojciech Plandowski,et al.  An efficient algorithm for solving word equations , 2006, STOC '06.

[25]  Hiroshi Inamura,et al.  Dynamic test input generation for web applications , 2008, ISSTA '08.

[26]  Fang Yu,et al.  Patching vulnerabilities with sanitization synthesis , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[27]  Jörg Schwenk,et al.  mXSS attacks: attacking well-secured web-applications by using innerHTML mutations , 2013, CCS.

[28]  Jacques Sakarovitch,et al.  Elements of Automata Theory , 2009 .

[29]  Valentin Goranko,et al.  Symbolic Model Checking of Tense Logics on Rational Kripke Models , 2008, ILC.

[30]  M. Lothaire Makanin's Algorithm , 2002 .

[31]  Oscar H. Ibarra,et al.  Automata-based symbolic string analysis for vulnerability detection , 2014, Formal Methods Syst. Des..

[32]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[33]  Westley Weimer,et al.  StrSolve: solving string constraints lazily , 2012, Automated Software Engineering.

[34]  R. BurchJ.,et al.  Symbolic model checking , 1992 .

[35]  Achim Blumensath,et al.  Automatic structures , 2000, Proceedings Fifteenth Annual IEEE Symposium on Logic in Computer Science (Cat. No.99CB36332).

[36]  Zhendong Su,et al.  Static Checking of Dynamically Generated Queries in Database Applications , 2004, ICSE 2004.

[37]  Parosh Aziz Abdulla,et al.  String Constraints for Verification , 2014, CAV.

[38]  Olivier Carton,et al.  Decision problems among the main subfamilies of rational relations , 2006, RAIRO Theor. Informatics Appl..

[39]  Oscar H. Ibarra,et al.  Reversal-Bounded Multicounter Machines and Their Decision Problems , 1978, JACM.

[40]  Joxan Jaffar,et al.  S3: A Symbolic String Solver for Vulnerability Detection in Web Applications , 2014, CCS.

[41]  Toby Walsh,et al.  Handbook of Satisfiability: Volume 185 Frontiers in Artificial Intelligence and Applications , 2009 .

[42]  Pierre Ganty,et al.  Parikhʼs theorem: A simple and direct automaton construction , 2010, Inf. Process. Lett..

[43]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[44]  Anthony Widjaja Lin,et al.  Expressive Languages for Path Queries over Graph-Structured Data , 2012, TODS.

[45]  Wojciech Plandowski Satisfiability of word equations with constants is in PSPACE , 2004, JACM.

[46]  Nils Klarlund,et al.  MONA Implementation Secrets , 2000, Int. J. Found. Comput. Sci..

[47]  Benjamin Livshits,et al.  SCRIPTGARD: automatic context-sensitive sanitization for large-scale legacy web applications , 2011, CCS '11.

[48]  Xiang Fu,et al.  Modeling Regular Replacement for String Constraint Solving , 2010, NASA Formal Methods.

[49]  Benjamin Livshits,et al.  Fast and Precise Sanitizer Analysis with BEK , 2011, USENIX Security Symposium.

[50]  Willem Visser,et al.  Symbolic execution of programs with strings , 2012, SAICSIT '12.

[51]  Daniel Kroening,et al.  A Survey of Automated Techniques for Formal Software Verification , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[52]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[53]  Michael D. Ernst,et al.  HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars , 2012, TSEM.

[54]  Cesare Tinelli,et al.  Satisfiability Modulo Theories , 2018, Handbook of Model Checking.

[55]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[56]  Dexter Kozen,et al.  Lower bounds for natural proof systems , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[57]  B. Scarpellini Complexity of subcases of Presburger arithmetic , 1984 .

[58]  Daniel Kroening,et al.  Decision Procedures , 2016, Texts in Theoretical Computer Science. An EATCS Series.

[59]  Stefan Schwoon,et al.  Model checking pushdown systems , 2002 .

[60]  Cesare Tinelli,et al.  Handbook of Satisfiability , 2021, Handbook of Satisfiability.

[61]  Andrei Voronkov,et al.  Translating Regular Expression Matching into Transducers , 2010, 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[62]  J. Richard Büchi,et al.  Definability in the Existential Theory of Concatenation and Undecidable Extensions of this Theory , 1988, Math. Log. Q..

[63]  Anthony Widjaja Lin,et al.  Algorithmic metatheorems for decidable LTL model checking over infinite systems , 2010, FoSSaCS.

[64]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.

[65]  Walter J. Savitch,et al.  Relationships Between Nondeterministic and Deterministic Tape Complexities , 1970, J. Comput. Syst. Sci..

[66]  Oscar H. Ibarra,et al.  Symbolic String Verification: Combining String Analysis and Size Analysis , 2009, TACAS.

[67]  Loris D'Antoni,et al.  Static Analysis of String Encoders and Decoders , 2013, VMCAI.

[68]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[69]  Cesare Tinelli,et al.  A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions , 2014, CAV.

[70]  Ben Stock,et al.  Precise Client-side Protection against DOM-based Cross-Site Scripting , 2014, USENIX Security Symposium.

[71]  Carlos A. Hurtado,et al.  Edinburgh Research Explorer Expressive Languages for Path Queries over Graph-Structured Data , 2012 .

[72]  Xiangyu Zhang,et al.  Z3-str: a z3-based string solver for web application analysis , 2013, ESEC/FSE 2013.

[73]  Christophe Morvan,et al.  On Rational Graphs , 2000, FoSSaCS.

[74]  Sharad Malik,et al.  Boolean satisfiability from theoretical hardness to practical success , 2009, Commun. ACM.

[75]  Armando Solar-Lezama,et al.  Word Equations with Length Constraints: What's Decidable? , 2012, Haifa Verification Conference.