Exploring C semantics and pointer provenance

The semantics of pointers and memory objects in C has been a vexed question for many years. C values cannot be treated as either purely abstract or purely concrete entities: the language exposes their representations, but compiler optimisations rely on analyses that reason about provenance and initialisation status, not just runtime representations. The ISO WG14 standard leaves much of this unclear, and in some respects differs with de facto standard usage --- which itself is difficult to investigate. In this paper we explore the possible source-language semantics for memory objects and pointers, in ISO C and in C as it is used and implemented in practice, focussing especially on pointer provenance. We aim to, as far as possible, reconcile the ISO C standard, mainstream compiler behaviour, and the semantics relied on by the corpus of existing C code. We present two coherent proposals, tracking provenance via integers and not; both address many design questions. We highlight some pros and cons and open questions, and illustrate the discussion with a library of test cases. We make our semantics executable as a test oracle, integrating it with the Cerberus semantics for much of the rest of C, which we have made substantially more complete and robust, and equipped with a web-interface GUI. This allows us to experimentally assess our proposals on those test cases. To assess their viability with respect to larger bodies of C code, we analyse the changes required and the resulting behaviour for a port of FreeBSD to CHERI, a research architecture supporting hardware capabilities, which (roughly speaking) traps on the memory safety violations which our proposals deem undefined behaviour. We also develop a new runtime instrumentation tool to detect possible provenance violations in normal C code, and apply it to some of the SPEC benchmarks. We compare our proposal with a source-language variant of the twin-allocation LLVM semantics proposal of Lee et al. Finally, we describe ongoing interactions with WG14, exploring how our proposals could be incorporated into the ISO standard.

[1]  Jeehoon Kang,et al.  A formal C memory model supporting integer-pointer casts , 2015, PLDI.

[2]  Suresh Jagannathan,et al.  CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency , 2013, JACM.

[3]  Xavier Leroy,et al.  Formal C Semantics: CompCert and the C Standard , 2014, ITP.

[4]  Daniel Kroening,et al.  CBMC - C Bounded Model Checker - (Competition Contribution) , 2014, TACAS.

[5]  Jacques-Henri Jourdan,et al.  A Simple, Possibly Correct LR Parser for C11 , 2017, ACM Trans. Program. Lang. Syst..

[6]  Peter G. Neumann,et al.  CHERI: A Hybrid Capability-System Architecture for Scalable Software Compartmentalization , 2015, 2015 IEEE Symposium on Security and Privacy.

[7]  Michael Norrish,et al.  Types, bytes, and separation logic , 2007, POPL '07.

[8]  Sandrine Blazy,et al.  A Precise and Abstract Memory Model for C Using Symbolic Values , 2014, APLAS.

[9]  Chucky Ellison,et al.  An executable formal semantics of C with applications , 2011, POPL '12.

[10]  Benjamin C. Pierce,et al.  Local type inference , 1998, POPL '98.

[11]  Wolfram Schulte,et al.  A Precise Yet Efficient Memory Model For C , 2009, Electron. Notes Theor. Comput. Sci..

[12]  Chucky Ellison,et al.  Defining the undefinedness of C , 2015, PLDI.

[13]  Xavier Leroy,et al.  A Formally Verified Compiler Back-end , 2009, Journal of Automated Reasoning.

[14]  Derek M. Jones The New C Standard An Economic and Cultural Commentary , 2004 .

[15]  Chung-Kil Hur,et al.  Taming undefined behavior in LLVM , 2017, PLDI.

[16]  Milo M. K. Martin,et al.  SoftBound: highly compatible and complete spatial memory safety for c , 2009, PLDI '09.

[17]  Nicholas Nethercote,et al.  How to shadow every byte of memory used by a program , 2007, VEE '07.

[18]  Michael Norrish C formalised in HOL , 1998 .

[19]  Peter Sewell,et al.  Mathematizing C++ concurrency , 2011, POPL '11.

[20]  Robbert Krebbers,et al.  A Typed C11 Semantics for Interactive Theorem Proving , 2015, CPP.

[21]  Yuri Gurevich,et al.  The Semantics of the C Programming Language , 1992, CSL.

[22]  Robbert Krebbers An operational and axiomatic semantics for non-determinism and sequence points in C , 2014, POPL.

[23]  Peter G. Neumann,et al.  Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine , 2015, ASPLOS.

[24]  Tom Ridge,et al.  Lem: reusable engineering of real-world semantics , 2014, ICFP.

[25]  Grigore Rosu,et al.  RV-Match: Practical Semantics-Based Program Analysis , 2016, CAV.

[26]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[27]  Robbert Krebbers,et al.  Subtleties of the ANSI / ISO C standard , 2012 .

[28]  Shinichi Shiraishi,et al.  Test suites for benchmarks of static analysis tools , 2015, 2015 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[29]  Xavier Leroy,et al.  The CompCert Memory Model, Version 2 , 2012 .

[30]  Peter G. Neumann,et al.  Capability Hardware Enhanced RISC Instructions: CHERI Instruction-set architecture , 2014 .

[31]  Peter Sewell,et al.  The Problem of Programming Language Concurrency Semantics , 2015, ESOP.

[32]  Sandrine Blazy,et al.  CompCertS: A Memory-Aware Verified C Compiler Using Pointer as Integer Semantics , 2017, ITP.

[33]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[34]  Robbert Krebbers,et al.  Separation Logic for Non-local Control Flow and Block Scope Variables , 2013, FoSSaCS.

[35]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[36]  Robbert Krebbers,et al.  The C standard formalized in Coq , 2015 .

[37]  Pascal Cuoq,et al.  Detecting Strict Aliasing Violations in the Wild , 2017, VMCAI.

[38]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[39]  Robbert Krebbers,et al.  Aliasing Restrictions of C11 Formalized in Coq , 2013, CPP.

[40]  Stephen Kell,et al.  Towards a dynamic object model within Unix processes , 2015, Onward!.

[41]  Robert N. M. Watson,et al.  Into the depths of C: elaborating the de facto standards , 2016, PLDI.

[42]  Michael Norrish,et al.  Deterministic Expressions in C , 1999, ESOP.

[43]  Nikolaos Papaspyrou,et al.  A Formal Semantics for the C Programming Language , 2000 .

[44]  Peter G. Neumann,et al.  The CHERI capability model: Revisiting RISC in an age of risk , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[45]  John Regehr,et al.  1 Reconciling High-level Optimizations and Low-level Code with Twin Memory Allocation , 2018 .

[46]  Xavier Leroy,et al.  Formal Verification of a C-like Memory Model and Its Uses for Verifying Program Transformations , 2008, Journal of Automated Reasoning.

[47]  Sandrine Blazy,et al.  A Concrete Memory Model for CompCert , 2015, ITP.