The missing link: explaining ELF static linking, semantically

Beneath the surface, software usually depends on complex linker behaviour to work as intended. Even linking hello_world.c is surprisingly involved, and systems software such as libc and operating system kernels rely on a host of linker features. But linking is poorly understood by working programmers and has largely been neglected by language researchers. In this paper we survey the many use-cases that linkers support and the poorly specified linker speak by which they are controlled: metadata in object files, command-line options, and linker-script language. We provide the first validated formalisation of a realistic executable and linkable format (ELF), and capture aspects of the Application Binary Interfaces for four mainstream platforms (AArch64, AMD64, Power64, and IA32). Using these, we develop an executable specification of static linking, covering (among other things) enough to link small C programs (we use the example of bzip2) into a correctly running executable. We provide our specification in Lem and Isabelle/HOL forms. This is the first formal specification of mainstream linking. We have used the Isabelle/HOL version to prove a sample correctness property for one case of AMD64 ABI relocation, demonstrating that the specification supports formal proof, and as a first step towards the much more ambitious goal of verified linking. Our work should enable several novel strands of research, including linker-aware verified compilation and program analysis, and better languages for controlling linking.

[1]  Jay Lepreau,et al.  Fast and Flexible Shared Libraries , 1993, USENIX Summer.

[2]  Andrew W. Appel,et al.  Compositional CompCert , 2015, POPL.

[3]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[4]  Ali Sezgin,et al.  Modelling the ARMv8 architecture, operationally: concurrency and ISA , 2016, POPL.

[5]  Eric Eide,et al.  Knit: component composition for systems software , 2000, OSDI.

[6]  Robert A. Gingell,et al.  Shared Libraries in SunOS , 1987 .

[7]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[8]  Anthony C. J. Fox Improved Tool Support for Machine-Code Decompilation in HOL4 , 2015, ITP.

[9]  Gabriel Kerneis,et al.  An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Sergey Bratus,et al.  "Weird Machines" in ELF: A Spotlight on the Underappreciated Metadata , 2013, WOOT.

[11]  Yannis Smaragdakis Layered Development with (Unix) Dynamic Libraries , 2002, ICSR.

[12]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[13]  Ramana Kumar,et al.  CakeML: a verified implementation of ML , 2014, POPL.

[14]  John R. White,et al.  Linkers and Loaders , 1972, CSUR.

[15]  Peng Wang,et al.  Compiler verification meets cross-language linking via data abstraction , 2014, OOPSLA.

[16]  Jeehoon Kang,et al.  Lightweight verification of separate compilation , 2016, POPL.

[17]  Adam Chlipala,et al.  A verified compiler for an impure functional language , 2010, POPL '10.

[18]  J. P. Boender,et al.  Certified Complexity (CerCo) , 2013, FOPARA.

[19]  Luca Cardelli,et al.  Program fragments, linking, and modularization , 1997, POPL '97.

[20]  J. Strother Moore Piton: A Mechanically Verified Assembly-Level Language , 1996 .

[21]  Suresh Jagannathan,et al.  CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency , 2013, JACM.

[22]  Joe B. Wells,et al.  Equational Reasoning for Linking with First-Class Primitive Modules , 2000, ESOP.

[23]  Tom Ridge,et al.  Lem: reusable engineering of real-world semantics , 2014, ICFP.

[24]  Elena Machkasova,et al.  A Calculus for Link-Time Compilation , 2000, ESOP.

[25]  J. Gregory Morrisett,et al.  Type-safe linking and modular assembly language , 1999, POPL '99.

[26]  William R. Cook,et al.  Mixin-based inheritance , 1990, OOPSLA/ECOOP '90.

[27]  Hans H. Kron,et al.  Programming-in-the-Large Versus Programming-in-the-Small , 1975, IEEE Transactions on Software Engineering.

[28]  Nacho Navarro,et al.  DITools: Application-level Support for Dynamic Extension and Flexible Composition , 2000, USENIX Annual Technical Conference, General Track.

[29]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[30]  C. Ieee IEEE Standard for Information Technology - Portable Operating System Interface (POSIX): System Application Program Interface (API), Amendment 1: Realtime Extension (C Language), IEEE Std 1003.1b-1993 , 1994 .

[31]  Elena Petrova,et al.  Verification of the C0 compiler implementation on the source code level , 2007 .

[32]  Sonia Fagorzi,et al.  A calculus of open modules: call-by-need strategy and confluence , 2007, Math. Struct. Comput. Sci..

[33]  Nick Benton,et al.  Coq: the world's best macro assembler? , 2013, PPDP.

[34]  Ulrich Drepper,et al.  How To Write Shared Libraries , 2005 .

[35]  Stephen Kell,et al.  Towards a dynamic object model within Unix processes , 2015, Onward!.