Into the depths of C: elaborating the de facto standards

C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-prone situation. First, we describe an in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice. We articulate many specific questions, build a suite of semantic test cases, gather experimental data from multiple implementations, and survey what C experts believe about the de facto standards. We identify questions where there is a consensus (either following ISO or differing) and where there are conflicts. We apply all this to an experimental C implemented above capability hardware. Second, we describe a formal model, Cerberus, for large parts of C. Cerberus is parameterised on its memory model; it is linkable either with a candidate de facto memory object model, under construction, or with an operational C11 concurrency model; it is defined by elaboration to a much simpler Core language for accessibility, and it is executable as a test oracle on small examples. This should provide a solid basis for discussion of what mainstream C is now: what programmers and analysis tools can assume and what compilers aim to implement. Ultimately we hope it will be a step towards clear, consistent, and accepted semantics for the various use-cases of C.

[1]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[2]  Benjamin Monate,et al.  A Value Analysis for C Programs , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[3]  Robbert Krebbers,et al.  A Typed C11 Semantics for Interactive Theorem Proving , 2015, CPP.

[4]  Yuri Gurevich,et al.  The Semantics of the C Programming Language , 1992, CSL.

[5]  Xavier Leroy,et al.  The CompCert Memory Model, Version 2 , 2012 .

[6]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[7]  Jeehoon Kang,et al.  A formal C memory model supporting integer-pointer casts , 2015, PLDI.

[8]  Suresh Jagannathan,et al.  CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency , 2013, JACM.

[9]  Milo M. K. Martin,et al.  SoftBound: highly compatible and complete spatial memory safety for c , 2009, PLDI '09.

[10]  Brian W. Kernighan,et al.  The C Programming Language , 1978 .

[11]  Michael Norrish C formalised in HOL , 1998 .

[12]  Charles McEwen Ellison,et al.  A formal semantics of C with applications , 2012 .

[13]  Chucky Ellison,et al.  An executable formal semantics of C with applications , 2011, POPL '12.

[14]  George C. Necula,et al.  CCured: type-safe retrofitting of legacy code , 2002, POPL '02.

[15]  Peter G. Neumann,et al.  The CHERI capability model: Revisiting RISC in an age of risk , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[16]  Tom Ridge,et al.  Lem: reusable engineering of real-world semantics , 2014, ICFP.

[17]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[18]  Peter G. Neumann,et al.  Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine , 2015, ASPLOS.

[19]  Robbert Krebbers,et al.  The C standard formalized in Coq , 2015 .

[20]  Dennis M. Ritchie,et al.  The C programming language - ANSI C - Second edition , 1988 .

[21]  Sandrine Blazy,et al.  A Precise and Abstract Memory Model for C Using Symbolic Values , 2014, APLAS.

[22]  Vikram S. Adve,et al.  Memory Safety for Low-Level Software/Hardware Interactions , 2009, USENIX Security Symposium.

[23]  Nikolaos Papaspyrou,et al.  A Formal Semantics for the C Programming Language , 2000 .

[24]  Armando Solar-Lezama,et al.  Towards optimization-safe systems: analyzing the impact of undefined behavior , 2013, SOSP.

[25]  Xavier Leroy,et al.  Formal Verification of a C-like Memory Model and Its Uses for Verifying Program Transformations , 2008, Journal of Automated Reasoning.

[26]  Sandrine Blazy,et al.  A Concrete Memory Model for CompCert , 2015, ITP.

[27]  Peter G. Neumann,et al.  CHERI: A Hybrid Capability-System Architecture for Scalable Software Compartmentalization , 2015, 2015 IEEE Symposium on Security and Privacy.

[28]  Michael Norrish,et al.  Types, bytes, and separation logic , 2007, POPL '07.

[29]  Wolfram Schulte,et al.  A Precise Yet Efficient Memory Model For C , 2009, Electron. Notes Theor. Comput. Sci..

[30]  Chucky Ellison,et al.  Defining the undefinedness of C , 2015, PLDI.

[31]  Alvin Cheung,et al.  Undefined behavior: what happened to my code? , 2012, APSys.

[32]  Michael Norrish,et al.  Deterministic Expressions in C , 1999, ESOP.

[33]  Milo M. K. Martin,et al.  Formalizing the LLVM intermediate representation for verified program transformations , 2012, POPL '12.

[34]  Peter Sewell,et al.  Mathematizing C++ concurrency , 2011, POPL '11.

[35]  Robbert Krebbers An operational and axiomatic semantics for non-determinism and sequence points in C , 2014, POPL.

[36]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[37]  Peter G. Neumann,et al.  Capability Hardware Enhanced RISC Instructions: CHERI Instruction-set architecture , 2014 .

[38]  James Cheney,et al.  Cyclone: A Safe Dialect of C , 2002, USENIX Annual Technical Conference, General Track.

[39]  Milo M. K. Martin,et al.  Hardbound: architectural support for spatial safety of the C programming language , 2008, ASPLOS.

[40]  Robbert Krebbers,et al.  Separation Logic for Non-local Control Flow and Block Scope Variables , 2013, FoSSaCS.

[41]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[42]  Robbert Krebbers,et al.  Aliasing Restrictions of C11 Formalized in Coq , 2013, CPP.

[43]  Peter Sewell,et al.  An operational semantics for C/C++11 concurrency , 2016, OOPSLA.