Formally verified big step semantics out of x86-64 binaries

This paper presents a methodology for generating formally proven equivalence theorems between decompiled x86-64 machine code and big step semantics. These proofs are built on top of two additional contributions. First, a robust and tested formal x86-64 machine model containing small step semantics for 1625 instructions. Second, a decompilation-into-logic methodology supporting both x86-64 assembly and machine code at large scale. This work enables black-box binary verification, i.e., formal verification of a binary where source code is unavailable. As such, it can be applied to safety-critical systems that consist of legacy components, or components whose source code is unavailable due to proprietary reasons. The methodology minimizes the trusted code base by leveraging machine-learned semantics to build a formal machine model. We apply the methodology to several case studies, including binaries that heavily rely on the SSE2 floating-point instruction set, and binaries that are obtained by compiling code that is obtained by inlining assembly into C code.

[1]  Clemens Ballarin Locales and Locale Expressions in Isabelle/Isar , 2003, TYPES.

[2]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[3]  Ramana Kumar,et al.  CakeML: a verified implementation of ML , 2014, POPL.

[4]  Thomas W. Reps,et al.  CodeSurfer/x86-A Platform for Analyzing x86 Executables , 2005, CC.

[5]  Timothy Bourke,et al.  seL4: From General Purpose to a Proof of Information Flow Enforcement , 2013, 2013 IEEE Symposium on Security and Privacy.

[6]  Magnus O. Myreen,et al.  A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture , 2010, ITP.

[7]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[8]  Matt Kaufmann,et al.  Simulation and formal verification of x86 machine-code programs that make system calls , 2014, 2014 Formal Methods in Computer-Aided Design (FMCAD).

[9]  Magnus O. Myreen,et al.  Translation validation for a verified OS kernel , 2013, PLDI.

[10]  Inria Paris-Rocquencourt,et al.  The CompCert C verified compiler , 2015 .

[11]  John Rushby,et al.  Formal Methods and their Role in the Certification of Critical Systems , 1997 .

[12]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[13]  Calton Pu,et al.  Protecting Systems from Stack Smashing Attacks with StackGuard , 1999 .

[14]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[15]  Lawrence Charles Paulson,et al.  Isabelle/HOL: A Proof Assistant for Higher-Order Logic , 2002 .

[16]  Christopher Krügel,et al.  Ramblr: Making Reassembly Great Again , 2017, NDSS.

[17]  Konrad Slind,et al.  Machine-Code Verification for Multiple Architectures - An Application of Decompilation into Logic , 2008, 2008 Formal Methods in Computer-Aided Design.

[18]  Matt Kaufmann,et al.  Engineering a Formal, Executable x86 ISA Simulator for Software Verification , 2017, Provably Correct Systems.

[19]  Xavier Leroy,et al.  A Formally Verified Compiler Back-end , 2009, Journal of Automated Reasoning.

[20]  Tom Ridge,et al.  The semantics of x86-CC multiprocessor machine code , 2009, POPL '09.

[21]  Panagiotis Manolios,et al.  Computer-aided reasoning : ACL2 case studies , 2000 .

[22]  Francesco Zappa Nardelli,et al.  x86-TSO , 2010, Commun. ACM.

[23]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[24]  Michael Norrish,et al.  seL4: formal verification of an OS kernel , 2009, SOSP '09.

[25]  Martin Höst,et al.  Development of Safety-Critical Software Systems Using Open Source Software -- A Systematic Map , 2014, 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications.

[26]  Zhong Shao,et al.  End-to-end verification of information-flow security for C and assembly programs , 2016, PLDI.

[27]  Ramana Kumar,et al.  A new verified compiler backend for CakeML , 2016, ICFP.

[28]  Warren A. Hunt,et al.  Towards a Formal Model of the X86 ISA , 2012 .

[29]  Jeremy E. Dawson,et al.  Isabelle Theories for Machine Words , 2009, AVoCS.

[30]  Markus Wenzel,et al.  Eisbach: A Proof Method Language for Isabelle , 2016, Journal of Automated Reasoning.

[31]  Konrad Slind,et al.  Decompilation into logic — Improved , 2012, 2012 Formal Methods in Computer-Aided Design (FMCAD).

[32]  Alexander Aiken,et al.  Stratified synthesis: automatically learning the x86-64 instruction set , 2016, PLDI.

[33]  Benjamin C. Pierce,et al.  Bidirectional programming languages , 2009 .

[34]  Ramana Kumar,et al.  Software Verification with ITPs Should Use Binary Code Extraction to Reduce the TCB - (Short Paper) , 2018, ITP.

[35]  Clark W. Barrett,et al.  The SMT-LIB Standard Version 2.0 , 2010 .

[36]  Suresh Jagannathan,et al.  CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency , 2013, JACM.