An Evaluation of Automata Algorithms for String Analysis

There has been significant recent interest in automated reasoning techniques, in particular constraint solvers, for string variables. These techniques support a wide variety of clients, ranging from static analysis to automated testing. The majority of string constraint solvers rely on finite automata to support regular expression constraints. For these approaches, performance depends critically on fast automata operations such as intersection, complementation, and determinization. Existing work in this area has not yet provided conclusive results as to which core algorithms and data structures work best in practice. In this paper, we study a comprehensive set of algorithms and data structures for performing fast automata operations. Our goal is to provide an apples-to-apples comparison between techniques that are used in current tools. To achieve this, we re-implemented a number of existing techniques. We use an established set of regular expressions benchmarks as an indicative workload. We also include several techniques that, to the best of our knowledge, have not yet been used for string constraint solving. Our results show that there is a substantial performance difference across techniques, which has implications for future tool design.

[1]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.

[2]  Nils Klarlund,et al.  MONA Implementation Secrets , 2000, Int. J. Found. Comput. Sci..

[3]  Nikolaj Bjørner,et al.  Symbolic Automata Constraint Solving , 2010, LPAR.

[4]  Krzysztof R. Apt,et al.  Logic Programming , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[5]  Frank Wolter,et al.  Monodic fragments of first-order temporal logics: 2000-2001 A.D , 2001, LPAR.

[6]  Sebastian Bala Regular Language Matching and Other Decidable Cases of the Satisfiability Problem for Constraints between Regular Open Terms , 2004, STACS.

[7]  Pierre Wolper,et al.  Representing Arithmetic Constraints with Finite Automata: An Overview , 2002, ICLP.

[8]  Westley Weimer,et al.  Solving string constraints lazily , 2010, ASE.

[9]  Simona Orzan,et al.  Distributed state space minimization , 2004, International Journal on Software Tools for Technology Transfer.

[10]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[11]  David Maier,et al.  Review of "Introduction to automata theory, languages and computation" by John E. Hopcroft and Jeffrey D. Ullman. Addison-Wesley 1979. , 1980, SIGA.

[12]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[13]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[14]  Margus Veanes,et al.  Rex: Symbolic Regular Expression Explorer , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[15]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[16]  Gertjan van Noord,et al.  Finite State Transducers with Predicates and Identities , 2001, Grammars.

[17]  Oscar H. Ibarra,et al.  Symbolic String Verification: Combining String Analysis and Size Analysis , 2009, TACAS.

[18]  Jim Law,et al.  Review of "The boost graph library: user guide and reference manual by Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine." Addison-Wesley 2002. , 2003, SOEN.

[19]  Westley Weimer,et al.  A decision procedure for subset constraints over regular languages , 2009, PLDI '09.

[20]  Nikolai Tillmann,et al.  Pex-White Box Test Generation for .NET , 2008, TAP.

[21]  Lucian Ilie,et al.  Follow automata , 2003, Inf. Comput..

[22]  Nikolai Tillmann,et al.  Reggae: Automated Test Generation for Programs Using Complex Regular Expressions , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[23]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[24]  Grzegorz Rozenberg,et al.  Developments in Language Theory II , 2002 .

[25]  Thierry Coquand,et al.  The Calculus of Constructions , 1988, Inf. Comput..

[26]  Nils Klarlund,et al.  Mona: Monadic Second-Order Logic in Practice , 1995, TACAS.

[27]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[28]  Michael D. Ernst,et al.  HAMPI: a solver for string constraints , 2009, ISSTA.

[29]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[30]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[31]  Fang Yu,et al.  Stranger: An Automata-Based String Analysis Tool for PHP , 2010, TACAS.

[32]  Zhendong Su,et al.  Sound and precise analysis of web applications for injection vulnerabilities , 2007, PLDI '07.

[33]  Michal Kunc,et al.  What Do We Know About Language Equations? , 2007, Developments in Language Theory.

[34]  Rajeev Alur,et al.  A Temporal Logic of Nested Calls and Returns , 2004, TACAS.

[35]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.