Symbolic String Transformations with Regular Lookahead and Rollback

Implementing string transformation routines, such as encoders, decoders, and sanitizers, correctly and efficiently is a difficult and error prone task. Such routines are often used in security critical settings, process large amounts of data, and must work efficiently and correctly. We introduce a new declarative language called Bex that builds on elements of regular expressions, symbolic automata and transducers, and enables a compilation scheme into C, C# or JavaScript that avoids many of the potential sources of errors that arise when such routines are implemented directly. The approach allows correctness analysis using symbolic automata theory that is not possible at the level of the generated code. Moreover, the case studies show that the generated code consistently outperforms hand-optimized code.

[1]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.

[2]  Loris D'Antoni,et al.  Minimization of symbolic automata , 2014, POPL.

[3]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[4]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[5]  Rajeev Alur,et al.  Regular Transformations of Infinite Strings , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.

[6]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[7]  Loris D'Antoni,et al.  Equivalence of Extended Symbolic Finite Transducers , 2013, CAV.

[8]  Domagoj Babic,et al.  Sigma*: symbolic learning of input-output specifications , 2013, POPL.

[9]  Christopher Krügel,et al.  Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[10]  Nikolaj Bjørner,et al.  Symbolic Tree Transducers , 2011, Ershov Memorial Conference.

[11]  Hiroshi Inamura,et al.  Dynamic test input generation for web applications , 2008, ISSTA '08.

[12]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[13]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[14]  Pavol Cerný,et al.  Streaming transducers for algorithmic verification of single-pass list-processing programs , 2010, POPL '11.

[15]  Loris D'Antoni,et al.  Static Analysis of String Encoders and Decoders , 2013, VMCAI.

[16]  Benjamin Livshits,et al.  Fast and Precise Sanitizer Analysis with BEK , 2011, USENIX Security Symposium.

[17]  Luc Segoufin Automata and Logics for Words and Trees over an Infinite Alphabet , 2006, CSL.

[18]  Nissim Francez,et al.  Finite-Memory Automata , 1994, Theor. Comput. Sci..

[19]  Bertrand Jeannet,et al.  Lattice Automata: A Representation for Languages on Infinite Alphabets, and Some Applications to Verification , 2007, SAS.

[20]  Benjamin Livshits,et al.  Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[21]  Shou-Feng Wang,et al.  𝒫𝒮-regular languages , 2011, Int. J. Comput. Math..

[22]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.