Which C semantics to embed in the front-end of a formally verified compiler?

We have been developing and formally verifying in Coq a moderately optimising compiler (called Compcert) for a large subset of the C language. This compiler comprises a back-end translating the Cminor intermediate language to PowerPC assembly code and a front-end translating the Clight subset of C to Cminor. Clight features all the types and operators of C as well as all the structured control statements of C, but excludes unstructured control. We have re-architected a previous front-end around the use of the CIL library. CIL provides an industrial-strength parser and type-checker for the C language, as well as a simplifier that eliminates or explicates many features of this language. CIL is written in Caml and is also used in other tools dedicated to the verification of C programs. As CIL performs too many simplifications, we have deactivated those that are not wanted in the context of a verified compiler. Our formalisation of C in Coq has been extended in two ways. Firstly, the abstract syntax describes a larger subset of C. For example, recursive struct and union types have been defined using a μ operator, and a limited switch statement has been added in the three languages of the front-end. Secondly, the semantics of C is defined coinductively using natural semantics rules for divergence, thus modelling non-terminating programs. The proofs of semantic preservation of the front-end have also been reused and extended in order to handle our new Clight language. The main difficulty while designing our semantics of Clight was to find the right level of abstraction between on the one hand a precise semantics enabling the proof of correctness properties of non-trivial code transformations as performed by a compiler, and on the other hand a semantics that is less strict than the C standard. For example, thanks to an abstract memory model , some popular violations of the C standard are specified in our semantics, but many other violations cannot be accounted for. As a result of this work, the Compcert compiler is now able to compile some realistic examples of C source code.