Abstracting Strings for Model Checking of C Programs

Data type abstraction plays a crucial role in software verification. In this paper, we introduce a domain for abstracting strings in the C programming language, where strings are managed as null-terminated arrays of characters. The new domain M-String is parametrized on an index (bound) domain and a character domain. By means of these different constituent domains, M-Strings captures shape information on the array structure as well as value information on the characters occurring in the string. By tuning these two parameters, M-String can be easily tailored for specific verification tasks, balancing precision against complexity. The concrete and the abstract semantics of basic operations on strings are carefully formalized, and soundness proofs are fully detailed. Moreover, for a selection of functions contained in the standard C library, we provide the semantics for character access and update, enabling an automatic lifting of arbitrary string-manipulating code into our new domain. An implementation of abstract operations is provided within a tool that automatically lifts existing programs into the M-String domain along with an explicit-state model checker. The accuracy of the proposed domain is experimentally evaluated on real-case test programs, showing that M-String can efficiently detect real-world bugs as well as to prove that program does not contain them after they are fixed.

[1]  Agostino Cortesi,et al.  String Abstraction for Model Checking of C Programs , 2019, SPIN.

[2]  Antoine Miné,et al.  Modular Static Analysis of String Manipulations in C Programs , 2018, SAS.

[3]  Agostino Cortesi,et al.  M-String Segmentation: A Refined Abstract Domain for String Analysis in C Programs , 2018, 2018 International Symposium on Theoretical Aspects of Software Engineering (TASE).

[4]  Petr Rockai,et al.  Symbolic Computation via Program Transformation , 2018, ICTAC.

[5]  Fang Yu,et al.  String Analysis for Software Verification and Security , 2018, Springer International Publishing.

[6]  Vladimír Still,et al.  Model Checking of C and C++ with DIVINE 4 , 2017, ATVA.

[7]  Peter J. Stuckey,et al.  Combining String Abstract Domains for JavaScript Analysis: An Evaluation , 2017, TACAS.

[8]  Hyeonseung Im,et al.  Precise and scalable static analysis of jQuery using a regular expression domain , 2016, DLS.

[9]  Fausto Spoto The Julia Static Analyzer for Java , 2016, SAS.

[10]  Agostino Cortesi,et al.  A suite of abstract domains for static analysis of string values , 2015, Softw. Pract. Exp..

[11]  Ben Hardekopf,et al.  JSAI: a static analysis platform for JavaScript , 2014, SIGSOFT FSE.

[12]  Esben Andreasen,et al.  String Analysis for Dynamic Field Access , 2014, CC.

[13]  Jorge A. Navas,et al.  Abstract Interpretation over Non-lattice Abstract Domains , 2013, SAS.

[14]  Agostino Cortesi,et al.  Widening and narrowing operators for abstract interpretation , 2011, Comput. Lang. Syst. Struct..

[15]  Patrick Cousot,et al.  A parametric segmentation functor for fully automatic and scalable array content analysis , 2011, POPL '11.

[16]  Mohammad Zulkernine,et al.  Classification of Static Analysis-Based Buffer Overflow Detectors , 2010, 2010 Fourth International Conference on Secure Software Integration and Reliability Improvement Companion.

[17]  Peter Thiemann,et al.  Type Analysis for JavaScript , 2009, SAS.

[18]  Rupak Majumdar,et al.  Testing for buffer overflows with length abstraction , 2008, ISSTA '08.

[19]  Antoine Miné Field-sensitive value analysis of embedded C programs with union types and pointer arithmetics , 2006, LCTES '06.

[20]  Robert C. Seacord,et al.  Secure coding in C and C , 2005 .

[21]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[22]  Dawson R. Engler,et al.  ARCHER: using symbolic, path-sensitive analysis to detect memory access errors , 2003, ESEC/FSE-11.

[23]  Michael Rodeh,et al.  CSSV: towards a realistic tool for statically detecting all buffer overflows in C , 2003, PLDI '03.

[24]  Gerard J. Holzmann,et al.  UNO: Static Source Code Checking for User-Defined Properties 1 , 2002 .

[25]  David Evans,et al.  Improving Security Using Extensible Lightweight Static Analysis , 2002, IEEE Softw..

[26]  David A. Wagner,et al.  A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities , 2000, NDSS.

[27]  Paul H. J. Kelly,et al.  Backwards-Compatible Bounds Checking for Arrays and Pointers in C Programs , 1997, AADEBUG.

[28]  A. One,et al.  Smashing The Stack For Fun And Profit , 1996 .

[29]  Patrick Cousot,et al.  Abstract Interpretation Frameworks , 1992, J. Log. Comput..

[30]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[31]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.