Taming undefined behavior in LLVM

A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an aspect of IR design that has received little attention: the role of undefined behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of undefined behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs. We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.

[1]  John Regehr,et al.  Provably correct peephole optimizations with alive , 2015, PLDI.

[2]  Dawn Xiaodong Song,et al.  The Correctness-Security Gap in Compiler Optimization , 2015, 2015 IEEE Security and Privacy Workshops.

[3]  Arthur B. Maccabe,et al.  The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages , 1990, PLDI '90.

[4]  Charles E. Leiserson,et al.  Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation , 2017, PPoPP.

[5]  Chucky Ellison,et al.  Defining the undefinedness of C , 2015, PLDI.

[6]  Milo M. K. Martin,et al.  Formalizing the LLVM intermediate representation for verified program transformations , 2012, POPL '12.

[7]  R. Kent Dybvig,et al.  Revised6 Report on the Algorithmic Language Scheme , 2009 .

[8]  Pierre Jouvelot,et al.  LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications , 2015, LLVM '15.

[9]  Robert N. M. Watson,et al.  Into the depths of C: elaborating the de facto standards , 2016, PLDI.

[10]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[11]  Peng Li,et al.  Understanding integer overflow in C/C++ , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[13]  Armando Solar-Lezama,et al.  Towards optimization-safe systems: analyzing the impact of undefined behavior , 2013, SOSP.

[14]  Thomas Fahringer,et al.  INSPIRE: The insieme parallel intermediate representation , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[15]  Robert Hieb,et al.  Revised 5 Report on the Algorithmic Language , 1999 .

[16]  Andrey Rybalchenko,et al.  Synthesizing software verifiers from proof rules , 2012, PLDI.

[17]  C. Scott Ananian,et al.  The static single information form , 2001 .

[18]  Sebastian Buchwald,et al.  FIRM—A Graph-Based Intermediate Representation , 2011 .

[19]  Bjarne Steensgaard Sparse Functional Stores for Imperative Programs , 1995, Intermediate Representations Workshop.

[20]  Frédéric Peschanski Parallel computing with the Pi-calculus , 2011, DAMP '11.

[21]  Viktor Vafeiadis,et al.  Formalizing the concurrency semantics of an LLVM fragment , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[22]  Jeehoon Kang,et al.  A formal C memory model supporting integer-pointer casts , 2015, PLDI.

[23]  Dan Grossman,et al.  Verified peephole optimizations for CompCert , 2016, PLDI.

[24]  Gilles Barthe,et al.  Formal Verification of an SSA-Based Middle-End for CompCert , 2014, TOPL.

[25]  M. Anton Ertl What every compiler writer should know about programmers or “ Optimization ” based on undefined behaviour hurts performance , 2015 .

[26]  Jorge A. Navas,et al.  Horn Clauses as an Intermediate Representation for Program Analysis and Transformation , 2015, Theory Pract. Log. Program..