Semantic foundations of intermediate program representations

An end-to-end guarantee of software correctness by formal verification must consider two sources of bugs. First, the verification tool must be correct. Second, programs are often verified at the source level, before being compiled. Hence, compilers should also be trustworthy. Verifiers and compilers' complexity is increasing. To simplify code analysis and manipulation, these tools rely on intermediate representations (IR) of programs, that provide structural and semantic properties. This thesis gives a formal, semantic account on IRs, so that they can also be leveraged in the formal proof of such tools. We first study a register-based IR of Java bytecode used in compilers and verifiers. We specify the IR generation by a semantic theorem stating what the transformation preserves, e.g. object initialization or exceptions, but also what it modifies and how, e.g. object allocation. We implement this IR in Sawja, a Java static analysis toolbench. Then, we study the Static Single Assignment (SSA) form, an IR widely used in modern compilers and verifiers. We implement and prove in Coq an SSA middle-end for the CompCert C compiler. For the proof of SSA optimizations, we identify a key semantic property of SSA, allowing for equational reasoning. Finally, we study the semantics of concurrent Java IRs. Due to instruction reorderings performed by the compiler and the hardware, the current definition of the Java Memory Model (JMM) is complex, and unfortunately formally flawed. Targetting x86 architectures, we identify a subset of the JMM that is intuitive and tractable in formal proofs. We characterize the reorderings it allows, and factor out a proof common to the IRs of a compiler.

[1]  Cormac Flanagan,et al.  Avoiding exponential explosion: generating compact verification conditions , 2001, POPL '01.

[2]  Sarita V. Adve,et al.  Memory models: a case for rethinking parallel languages and hardware , 2009, PODC '09.

[3]  Bowen Alpern,et al.  Detecting equality of variables in programs , 1988, POPL '88.

[4]  Manuel Fähndrich,et al.  On the Relative Completeness of Bytecode Analysis Versus Source Code Analysis , 2008, CC.

[5]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[6]  Alexander Knapp,et al.  The Java Memory Model: Operationally, Denotationally, Axiomatically , 2007, ESOP.

[7]  Andrew Kennedy,et al.  Compiling with continuations, continued , 2007, ICFP '07.

[8]  Xavier Leroy,et al.  A simple, verified validator for software pipelining , 2010, POPL '10.

[9]  Martín Abadi,et al.  A type system for Java bytecode subroutines , 1999, TOPL.

[10]  William Pugh The Java memory model is fatally flawed , 2000, Concurr. Pract. Exp..

[11]  Gary A. Kildall,et al.  A unified approach to global program optimization , 1973, POPL.

[12]  Jaroslav Sevcík,et al.  Program transformations in weak memory models , 2009 .

[13]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[14]  Laurie Hendren,et al.  Jimple: Simplifying Java Bytecode for Analyses and Transformations , 1998 .

[15]  George C. Necula,et al.  Translation validation for an optimizing compiler , 2000, PLDI '00.

[16]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[17]  Sabine Glesner An ASM Semantics for SSA Intermediate Representations , 2004, Abstract State Machines.

[18]  David Pichardie,et al.  Secure the Clones - Static Enforcement of Policies for Secure Object Copying , 2011, ESOP.

[19]  Gilles Barthe,et al.  A Formally Verified SSA-Based Middle-End - Static Single Assignment Meets CompCert , 2012, ESOP.

[20]  Guy L. Steele,et al.  Java(TM) Language Specification , 2005 .

[21]  Jr. Guy L. Steele,et al.  Rabbit: A Compiler for Scheme , 1978 .

[22]  Eric Van Wyk,et al.  Compiler Optimization Correctness by Temporal Logic , 2004, High. Order Symb. Comput..

[23]  David Aspinall,et al.  Formalising Java's Data Race Free Guarantee , 2007, TPHOLs.

[24]  Songtao Xia,et al.  Towards array bound check elimination in Java TM virtual machine language , 1999, CASCON.

[25]  Amir Pnueli,et al.  Translation Validation , 1998, TACAS.

[26]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.

[27]  Amir Pnueli,et al.  TVOC: A Translation Validator for Optimizing Compilers , 2005, CAV.

[28]  Erik Ruf,et al.  Marmot: an optimizing compiler for Java , 2000, Softw. Pract. Exp..

[29]  K. Rustan M. Leino,et al.  Weakest-precondition of unstructured programs , 2005, PASTE '05.

[30]  Peter Sewell,et al.  Clarifying and compiling C/C++ concurrency: from C++11 to POWER , 2012, POPL '12.

[31]  Robert E. Tarjan,et al.  Dominator tree verification and vertex-disjoint paths , 2005, SODA '05.

[32]  Andrew W. Appel,et al.  Oracle Semantics for Concurrent Separation Logic , 2008, ESOP.

[33]  Robin Milner,et al.  An Algebraic Definition of Simulation Between Programs , 1971, IJCAI.

[34]  F. Lockwood Morris,et al.  Advice on structuring compilers and proving them correct , 1973, POPL.

[35]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[36]  Elsa L. Gunter,et al.  A Framework for Formal Verification of Compiler Optimizations , 2010, ITP.

[37]  Andrew McCreight,et al.  A verifiable SSA program representation for aggressive compiler optimization , 2006, POPL '06.

[38]  Gérard Boudol,et al.  Relaxed memory models: an operational approach , 2009, POPL '09.

[39]  J. Strother Moore,et al.  A mechanically verified language implementation , 1989, Journal of Automated Reasoning.

[40]  Majid Sarrafzadeh,et al.  Advances in static single assignment form and register allocation , 2006 .

[41]  Benjamin Grégoire,et al.  The MOBIUS Proof Carrying Code Infrastructure , 2008, FMCO.

[42]  Satish Narayanasamy,et al.  End-to-end sequential consistency , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[43]  Olivier Danvy,et al.  A first-order one-pass CPS transformation , 2001, Theor. Comput. Sci..

[44]  Sorin Lerner,et al.  Equality-Based Translation Validator for LLVM , 2011, CAV.

[45]  Benjamin Grégoire,et al.  A Structured Approach to Proving Compiler Optimizations Based on Dataflow Analysis , 2004, TYPES.

[46]  Nick Benton,et al.  Simple relational correctness proofs for static analyses and program transformations , 2004, POPL.

[47]  Richard Kelsey,et al.  A correspondence between continuation passing style and static single assignment form , 1995, IR '95.

[48]  Xavier Leroy,et al.  Mechanized Semantics for the Clight Subset of the C Language , 2009, Journal of Automated Reasoning.

[49]  Elvira Albert,et al.  Cost Analysis of Java Bytecode , 2007, ESOP.

[50]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[51]  Tobias Nipkow,et al.  Bytecode Analysis for Proof Carrying Code , 2005, Bytecode@ETAPS.

[52]  Kiyokuni Kawachiya,et al.  Lock reservation: Java locks can mostly do without atomic operations , 2002, OOPSLA '02.

[53]  Chung-Kil Hur,et al.  Biorthogonality, step-indexing and compiler correctness , 2009, ICFP.

[54]  David Gregg,et al.  Virtual machine showdown: stack versus registers , 2005, VEE '05.

[55]  Egon Börger,et al.  Java and the Java Virtual Machine: Definition, Verification, Validation , 2001 .

[56]  K. Rustan M. Leino,et al.  Dafny: An Automatic Program Verifier for Functional Correctness , 2010, LPAR.

[57]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[58]  Andrew W. Appel,et al.  Verified heap theorem prover by paramodulation , 2012, ICFP.

[59]  T. B. Steel A first version of UNCOL , 1961, IRE-AIEE-ACM '61 (Western).

[60]  Sorin Lerner,et al.  Composing dataflow analyses and transformations , 2002, POPL '02.

[61]  Manuel Fähndrich,et al.  Static Contract Checking with Abstract Interpretation , 2010, FoVeOOS.

[62]  Satish Narayanasamy,et al.  A case for an SC-preserving compiler , 2011, PLDI '11.

[63]  Keith D. Cooper,et al.  Combining analyses, combining optimizations , 1995, TOPL.

[64]  Keith D. Cooper,et al.  Practical improvements to the construction and destruction of static single assignment form , 1998, Softw. Pract. Exp..

[65]  Jade Alglave,et al.  Fences in Weak Memory Models , 2010, CAV.

[66]  Milo M. K. Martin,et al.  Formalizing the LLVM intermediate representation for verified program transformations , 2012, POPL '12.

[67]  K. Rustan M. Leino,et al.  BoogiePL: A typed procedural language for checking object-oriented programs , 2005 .

[68]  Ken Kennedy,et al.  Fast copy coalescing and live-range identification , 2002, PLDI '02.

[69]  Peter Sewell,et al.  Mathematizing C++ concurrency , 2011, POPL '11.

[70]  Gilles Barthe,et al.  Non-interference for a JVM-like language , 2005, TLDI '05.

[71]  Patrick Baudin,et al.  Caveat: a tool for software validation , 2002, Proceedings International Conference on Dependable Systems and Networks.

[72]  Andreas Lochbihler,et al.  Java and the Java Memory Model - A Unified, Machine-Checked Formalisation , 2012, ESOP.

[73]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[74]  Xavier Leroy,et al.  Verified validation of lazy code motion , 2009, PLDI '09.

[75]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[76]  Martin C. Rinard,et al.  Analysis of Multithreaded Programs , 2001, SAS.

[77]  Paul Hudak,et al.  ORBIT: an optimizing compiler for scheme , 1986, SIGPLAN '86.

[78]  Patrick Cousot,et al.  The calculational design of a generic abstract interpreter , 1999 .

[79]  Martin Strecker,et al.  Formal Verification of a Java Compiler in Isabelle , 2002, CADE.

[80]  Stephen N. Freund,et al.  A type system for object initialization in the Java bytecode language , 1998, OOPSLA '98.

[81]  Robert E. Tarjan,et al.  A fast algorithm for finding dominators in a flowgraph , 1979, TOPL.

[82]  J. Gregory Morrisett,et al.  Evaluating value-graph translation validation for LLVM , 2011, PLDI '11.

[83]  Bertrand Jeannet,et al.  Apron: A Library of Numerical Abstract Domains for Static Analysis , 2009, CAV.

[84]  Radha Jagadeesan,et al.  Generative Operational Semantics for Relaxed Memory Models , 2010, ESOP.

[85]  Hans-Juergen Boehm,et al.  HP Laboratories , 2006 .

[86]  Hanspeter Mössenböck,et al.  Single-pass generation of static single-assignment form for structured languages , 1994, TOPL.

[87]  Martín Abadi,et al.  The existence of refinement mappings , 1988, [1988] Proceedings. Third Annual Information Symposium on Logic in Computer Science.

[88]  David Aspinall Java Memory Model Examples: Good, Bad and Ugly , 2007 .

[89]  Tobias Nipkow,et al.  A machine-checked model for a Java-like language, virtual machine, and compiler , 2006, TOPL.

[90]  K. Rustan M. Leino,et al.  Efficient weakest preconditions , 2005, Inf. Process. Lett..

[91]  Andrew W. Appel,et al.  Modern Compiler Implementation in ML , 1997 .

[92]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[93]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[94]  Peter Lee,et al.  TIL: a type-directed, optimizing compiler for ML , 2004, SIGP.

[95]  Wolfgang J. Paul,et al.  Towards the Formal Verification of a C0 Compiler: Code Generation and Implementation Correctnes , 2005, SEFM.

[96]  J. Strother Moore Piton: A Mechanically Verified Assembly-Level Language , 1996 .

[97]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[98]  David A. Schmidt Structure-Preserving Binary Relations for Program Abstraction , 2002, The Essence of Computation.

[99]  David Aspinall,et al.  On Validity of Program Transformations in the Java Memory Model , 2008, ECOOP.

[100]  Tom Ridge,et al.  The semantics of x86-CC multiprocessor machine code , 2009, POPL '09.

[101]  Antoine Miné,et al.  Static Analysis of Run-Time Errors in Embedded Critical Parallel C Programs , 2011, ESOP.

[102]  Patrick Cousot,et al.  A static analyzer for large safety-critical software , 2003, PLDI '03.

[103]  Jan Vitek,et al.  High-level programming of embedded hard real-time devices , 2010, EuroSys '10.

[104]  Maulik A. Dave,et al.  Compiler verification: a bibliography , 2003, SOEN.

[105]  Ondrej Lhoták,et al.  Evaluating the benefits of context-sensitive points-to analysis using a BDD-based implementation , 2008, TSEM.

[106]  David A. Schmidt Binary Relations for Abstraction and Refinement , 2000 .

[107]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[108]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[109]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[110]  Suresh Jagannathan,et al.  Relaxed-memory concurrency and verified compilation , 2011, POPL '11.

[111]  Laurent Hubert Foundations and implementation of a tool bench for static analysis of Java bytecode programs , 2010 .

[112]  Sebastian Hack,et al.  Register allocation for programs in SSA form , 2006, CC.

[113]  Laurent Hubert A non-null annotation inferencer for Java bytecode , 2008, PASTE '08.

[114]  Fausto Spoto,et al.  Julia: A Generic Static Analyser for the Java Bytecode , 2005 .

[115]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[116]  Emina Torlak,et al.  MemSAT: checking axiomatic specifications of memory models , 2010, PLDI '10.

[117]  Nick Benton,et al.  Formalizing and verifying semantic type soundness of a simple compiler , 2007, PPDP '07.

[118]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[119]  David Pichardie,et al.  A Certified Data Race Analysis for a Java-like Language , 2009, TPHOLs.

[120]  Alexander Aiken,et al.  Conditional must not aliasing for static race detection , 2007, POPL '07.

[121]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[122]  David Pichardie,et al.  Enforcing Secure Object Initialization in Java , 2010, ESORICS.

[123]  David Pichardie,et al.  Sawja: Static Analysis Workshop for Java , 2010, FoVeOOS.

[124]  Anindya Banerjee,et al.  Stack-based access control and secure information flow , 2005, J. Funct. Program..

[125]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[126]  Xavier Leroy,et al.  A Formally Verified Compiler Back-end , 2009, Journal of Automated Reasoning.

[127]  Jens Knoop,et al.  Constant Propagation on the Value Graph: Simple Constants and Beyond , 2000, CC.

[128]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[129]  Benoît Dupont de Dinechin,et al.  Fast liveness checking for ssa-form programs , 2008, CGO '08.

[130]  Cliff Click,et al.  The java hotspot TM server compiler , 2001 .

[131]  Adam Chlipala,et al.  A verified compiler for an impure functional language , 2010, POPL '10.

[132]  Cliff Click,et al.  Global code motion/global value numbering , 1995, PLDI '95.

[133]  Xavier Leroy,et al.  Formal Verification of a C-like Memory Model and Its Uses for Verifying Program Transformations , 2008, Journal of Automated Reasoning.

[134]  Raymond Lo,et al.  A new algorithm for partial redundancy elimination based on SSA form , 1997, PLDI '97.

[135]  Yutaka Matsuno,et al.  A type system equivalent to static single assignment , 2006, PPDP '06.

[136]  Hans-Juergen Boehm,et al.  You don't know jack about shared variables or memory models , 2011, Commun. ACM.

[137]  Emmanuel Chailloux,et al.  Experience report: using objective caml to develop safety-critical embedded tools in a certification framework , 2009, ICFP.

[138]  Peter W. O'Hearn,et al.  Resources, Concurrency and Local Reasoning , 2004, CONCUR.

[139]  David F. Bacon,et al.  Fast static analysis of C++ virtual function calls , 1996, OOPSLA '96.

[140]  Gerhard Goos,et al.  Verification of Compilers , 1999, Correct System Design.

[141]  J. W. Backus,et al.  The FORTRAN automatic coding system , 1899, IRE-AIEE-ACM '57 (Western).

[142]  William Pugh The Java memory model is fatally flawed , 2000 .

[143]  Grace Murray Hopper,et al.  The Education of a Computer , 1952, Annals of the History of Computing.

[144]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[145]  Jalal Kawash,et al.  DEFINING AND COMPARING MEMORY CONSISTENCY MODELS , 1997 .

[146]  Laurie J. Hendren,et al.  Efficient Inference of Static Types for Java Bytecode , 2000, SAS.

[147]  Gérard Boudol,et al.  A Theory of Speculative Computation , 2010, ESOP.

[148]  David A. Padua,et al.  Efficient building and placing of gating functions , 1995, PLDI '95.

[149]  Xavier Leroy,et al.  Mechanized Verification of CPS Transformations , 2007, LPAR.

[150]  Pierre Courtieu,et al.  Tool-Assisted Specification and Verification of the JavaCard Platform , 2002, AMAST.

[151]  Benoît Dupont de Dinechin,et al.  Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency , 2009, 2009 International Symposium on Code Generation and Optimization.

[152]  Sebastian Pop,et al.  The SSA Representation Framework: Semantics, Analyses and GCC Implementation , 2006 .

[153]  Andrew W. Appel,et al.  Compiling with Continuations , 1991 .

[154]  Elvira Albert,et al.  Cost Relation Systems: A Language-Independent Target Language for Cost Analysis , 2009, PROLE.

[155]  Chucky Ellison,et al.  An executable formal semantics of C with applications , 2011, POPL '12.

[156]  Gilles Barthe,et al.  A certified lightweight non-interference Java bytecode verifier† , 2007, Mathematical Structures in Computer Science.

[157]  Xavier Leroy,et al.  Formal verification of translation validators: a case study on instruction scheduling optimizations , 2008, POPL '08.

[158]  Michael Stepp,et al.  Equality saturation: a new approach to optimization , 2009, POPL '09.

[159]  James Gosling,et al.  The Java Language Specification, 3rd Edition , 2005 .

[160]  Sebastian Burckhardt,et al.  Verifying Local Transformations on Relaxed Memory Models , 2010, CC.

[161]  Joshua Bloch Effective Java (2nd Edition) (The Java Series) , 2008 .

[162]  K. Rustan M. Leino,et al.  Using the Spec# Language, Methodology, and Tools to Write Bug-Free Programs , 2008, LASER Summer School.

[163]  Mark D. Hill,et al.  A Unified Formalization of Four Shared-Memory Models , 1993, IEEE Trans. Parallel Distributed Syst..

[164]  Andrew W. Appel,et al.  SSA is functional programming , 1998, SIGP.

[165]  Xavier Leroy,et al.  Tilting at Windmills with Coq: Formal Verification of a Compilation Algorithm for Parallel Moves , 2007, Journal of Automated Reasoning.

[166]  John McCarthy,et al.  Correctness of a compiler for arithmetic expressions , 1966 .

[167]  Serdar Tasiran,et al.  Goldilocks: a race and transaction-aware java runtime , 2007, PLDI '07.

[168]  François Bourdoncle,et al.  Efficient chaotic iteration strategies with widenings , 1993, Formal Methods in Programming and Their Applications.

[169]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[170]  Amr Sabry,et al.  The essence of compiling with continuations (with retrospective) , 1993, PLDI 1993.

[171]  Ondrej Lhoták,et al.  Scaling Java Points-to Analysis Using SPARK , 2003, CC.

[172]  Glynn Winskel,et al.  The formal semantics of programming languages - an introduction , 1993, Foundation of computing series.

[173]  Sabine Glesner,et al.  Optimizing Code Generation from SSA Form: A Comparison Between Two Formal Correctness Proofs in Isabelle/HOL , 2005, COCV@ETAPS.

[174]  Gerald J. Sussman,et al.  Scheme: A Interpreter for Extended Lambda Calculus , 1998, High. Order Symb. Comput..

[175]  Xavier Leroy,et al.  Validating LR(1) Parsers , 2012, ESOP.

[176]  David Robson,et al.  Smalltalk-80: The Language and Its Implementation , 1983 .

[177]  David Pichardie,et al.  Proof-carrying code from certified abstract interpretation and fixpoint compression , 2006, Theor. Comput. Sci..

[178]  Amr Sabry,et al.  The essence of compiling with continuations , 1993, PLDI '93.

[179]  Bowen Alpern,et al.  Experiences Porting the Jikes RVM to Linux/IA32 , 2002, Java Virtual Machine Research and Technology Symposium.

[180]  David Pichardie,et al.  A Provably Correct Stackless Intermediate Representation for Java Bytecode , 2010, APLAS.

[181]  David Pichardie,et al.  Preservation of Proof Pbligations for Hybrid Verification Methods , 2008, 2008 Sixth IEEE International Conference on Software Engineering and Formal Methods.

[182]  John Whaley,et al.  Dynamic Optimization through the use of Automatic Runtime Specialization , 1999 .

[183]  Jaroslav Sevcík Safe optimisations for shared-memory concurrent programs , 2011, PLDI '11.

[184]  Marieke Huisman,et al.  The Java Memory Model: a Formal Explanation , 2007 .

[185]  Corporate SPARC architecture manual - version 8 , 1992 .

[186]  Andrew W. Appel,et al.  VeriSmall: Verified Smallfoot Shape Analysis , 2011, CPP.

[187]  Jan Vitek,et al.  Plan B: a buffered memory model for Java , 2013, POPL.

[188]  Jade Alglave,et al.  Stability in Weak Memory Models , 2011, CAV.

[189]  Roy Dz-Ching Ju,et al.  Translating Out of Static Single Assignment Form , 1999, SAS.

[190]  Robert Wilson,et al.  Compiling Java just in time , 1997, IEEE Micro.

[191]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[192]  Bernhard Steffen,et al.  Basic-Block Graphs: Living Dinosaurs? , 1998, CC.

[193]  Xavier Leroy,et al.  Validating Register Allocation and Spilling , 2010, CC.