Constraint normalization and parameterized caching for quantitative program analysis

Symbolic program analysis techniques rely on satisfiability-checking constraint solvers, while quantitative program analysis techniques rely on model-counting constraint solvers. Hence, the efficiency of satisfiability checking and model counting is crucial for efficiency of modern program analysis techniques. In this paper, we present a constraint caching framework to expedite potentially expensive satisfiability and model-counting queries. Integral to this framework is our new constraint normalization procedure under which the cardinality of the solution set of a constraint, but not necessarily the solution set itself, is preserved. We extend these constraint normalization techniques to string constraints in order to support analysis of string-manipulating code. A group-theoretic framework which generalizes earlier results on constraint normalization is used to express our normalization techniques. We also present a parameterized caching approach where, in addition to storing the result of a model-counting query, we also store a model-counter object in the constraint store that allows us to efficiently recount the number of satisfying models for different maximum bounds. We implement our caching framework in our tool Cashew, which is built as an extension of the Green caching framework, and integrate it with the symbolic execution tool Symbolic PathFinder (SPF) and the model-counting constraint solver ABC. Our experiments show that constraint caching can significantly improve the performance of symbolic and quantitative program analyses. For instance, Cashew can normalize the 10,104 unique constraints in the SMC/Kaluza benchmark down to 394 normal forms, achieve a 10x speedup on the SMC/Kaluza-Big dataset, and an average 3x speedup in our SPF-based side-channel analysis experiments.

[1]  Bruno Dutertre,et al.  Yices 2.2 , 2014, CAV.

[2]  James Mackenzie Crawford A theoretical analysis of reasoning by symmetry in first-order logic (extended abstract) , 1992 .

[3]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[4]  Guodong Li,et al.  PASS: String Solving with Parameterized Array and Interval Automaton , 2013, Haifa Verification Conference.

[5]  Sarfraz Khurshid,et al.  Generalized Symbolic Execution for Model Checking and Testing , 2003, TACAS.

[6]  Albert Oliveras,et al.  6 Years of SMT-COMP , 2012, Journal of Automated Reasoning.

[7]  Parosh Aziz Abdulla,et al.  String Constraints for Verification , 2014, CAV.

[8]  Henry A. Kautz,et al.  Toward Caching Symmetrical Subtheories for Weighted Model Counting , 2016, AAAI Workshop: Beyond NP.

[9]  Cesare Tinelli,et al.  A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings , 2015, FroCos.

[10]  Wei Hu,et al.  Quantifying timing-based information flow in cryptographic hardware , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[11]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[12]  Ilya Shlyakhter,et al.  Generating effective symmetry-breaking predicates for search problems , 2001, Discret. Appl. Math..

[13]  Giovanni Denaro,et al.  Reusing constraint proofs in program analysis , 2015, ISSTA.

[14]  Michael Backes,et al.  Automatic Discovery and Quantification of Information Leaks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[15]  Corina S. Pasareanu,et al.  Symbolic PathFinder: integrating symbolic execution with model checking for Java bytecode analysis , 2013, Automated Software Engineering.

[16]  Marcelo d'Amorim,et al.  Quantifying information leaks using reliability analysis , 2014, SPIN.

[17]  James M. Crawford,et al.  Symmetry-Breaking Predicates for Search Problems , 1996, KR.

[18]  Xiangyu Zhang,et al.  Z3-str: a z3-based string solver for web application analysis , 2013, ESEC/FSE 2013.

[19]  Tevfik Bultan,et al.  Automata-Based Model Counting for String Constraints , 2015, CAV.

[20]  Cesare Tinelli,et al.  An efficient SMT solver for string constraints , 2016, Formal Methods Syst. Des..

[21]  Cesare Tinelli,et al.  The SMT-LIB Initiative and the Rise of SMT - (HVC 2010 Award Talk) , 2010, Haifa Verification Conference.

[22]  Igor L. Markov,et al.  Efficient symmetry breaking for Boolean satisfiability , 2003, IEEE Transactions on Computers.

[23]  Matthew B. Dwyer,et al.  Probabilistic symbolic execution , 2012, ISSTA 2012.

[24]  Armando Solar-Lezama,et al.  Word Equations with Length Constraints: What's Decidable? , 2012, Haifa Verification Conference.

[25]  Michael D. Ernst,et al.  HAMPI: a solver for string constraints , 2009, ISSTA.

[26]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[27]  Matthew B. Dwyer,et al.  Exact and approximate probabilistic symbolic execution for nondeterministic programs , 2014, ASE.

[28]  Giovanni Denaro,et al.  Heuristically Matching Solution Spaces of Arithmetic Formulas to Efficiently Reuse Solutions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[29]  Carlo Ghezzi,et al.  Enhancing reuse of constraint solutions to improve symbolic execution , 2015, ISSTA.

[30]  Marcelo d'Amorim,et al.  Iterative distribution-aware sampling for probabilistic symbolic execution , 2015, ESEC/SIGSOFT FSE.

[31]  Westley Weimer,et al.  Solving string constraints lazily , 2010, ASE.

[32]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[33]  Marc Thurley,et al.  sharpSAT - Counting Models with Advanced Component Caching and Implicit BCP , 2006, SAT.

[34]  Sudhir Aggarwal,et al.  Testing metrics for password creation policies by attacking large sets of revealed passwords , 2010, CCS '10.

[35]  David Clark,et al.  A static analysis for quantifying information flow in a simple imperative language , 2007, J. Comput. Secur..

[36]  Matthew B. Dwyer,et al.  Green: reducing, reusing and recycling constraints in program analysis , 2012, SIGSOFT FSE.

[37]  Supratik Chakraborty,et al.  Approximate Probabilistic Inference via Word-Level Counting , 2015, AAAI.

[38]  Ian P. Gent,et al.  Symmetry in Constraint Programming , 2006, Handbook of Constraint Programming.

[39]  Alan J. Hu,et al.  Precisely Measuring Quantitative Information Flow: 10K Lines of Code and Beyond , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[40]  Westley Weimer,et al.  A decision procedure for subset constraints over regular languages , 2009, PLDI '09.

[41]  Clark W. Barrett,et al.  The SMT-LIB Standard Version 2.0 , 2010 .

[42]  Pasquale Malacaria,et al.  Quantifying information leaks in software , 2010, ACSAC '10.

[43]  Corina S. Pasareanu,et al.  Symbolic quantitative information flow , 2012, SOEN.

[44]  Jesús A. De Loera,et al.  Effective lattice point counting in rational convex polytopes , 2004, J. Symb. Comput..

[45]  Pasquale Malacaria,et al.  Abstract model counting: a novel approach for quantification of information leaks , 2014, AsiaCCS.

[46]  Tevfik Bultan,et al.  String analysis for side channels with segmented oracles , 2016, SIGSOFT FSE.

[47]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[48]  Joxan Jaffar,et al.  S3: A Symbolic String Solver for Vulnerability Detection in Web Applications , 2014, CCS.

[49]  Shweta Shinde,et al.  A model counter for constraints over unbounded strings , 2014, PLDI.

[50]  Corina S. Pasareanu,et al.  Reliability analysis in Symbolic PathFinder , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[51]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[52]  Geoffrey Smith,et al.  On the Foundations of Quantitative Information Flow , 2009, FoSSaCS.

[53]  Ian P. Gent,et al.  Symmetry breaking during search in constraint programming , 1999 .