Stratified synthesis: automatically learning the x86-64 instruction set

The x86-64 ISA sits at the bottom of the software stack of most desktop and server software. Because of its importance, many software analysis and verification tools depend, either explicitly or implicitly, on correct modeling of the semantics of x86-64 instructions. However, formal semantics for the x86-64 ISA are difficult to obtain and often written manually through great effort. We describe an automatically synthesized formal semantics of the input/output behavior for a large fraction of the x86-64 Haswell ISA’s many thousands of instruction variants. The key to our results is stratified synthesis, where we use a set of instructions whose semantics are known to synthesize the semantics of additional instructions whose semantics are unknown. As the set of formally described instructions increases, the synthesis vocabulary expands, making it possible to synthesize the semantics of increasingly complex instructions. Using this technique we automatically synthesized formal semantics for 1,795 instruction variants of the x86-64 Haswell ISA. We evaluate the learned semantics against manually written semantics (where available) and find that they are formally equivalent with the exception of 50 instructions, where the manually written semantics contain an error. We further find the learned formulas to be largely as precise as manually written ones and of similar size.

[1]  H. Chandler Practical , 1982, Digital Transformation of the Laboratory.

[2]  Thomas W. Reps,et al.  Symbolic Implementation of the Best Transformer , 2004, VMCAI.

[3]  John Regehr,et al.  HOIST: a system for automatically deriving static analyzers for embedded systems , 2004, ASPLOS XI.

[4]  Mihai Christodorescu,et al.  String analysis for x86 binaries , 2005, PASTE '05.

[5]  Thomas W. Reps,et al.  CodeSurfer/x86-A Platform for Analyzing x86 Executables , 2005, CC.

[6]  Armando Solar-Lezama,et al.  Programming by sketching for bit-streaming programs , 2005, PLDI '05.

[7]  Alexander Aiken,et al.  Automatic generation of peephole superoptimizers , 2006, ASPLOS XII.

[8]  John Regehr,et al.  Deriving abstract transfer functions for analyzing embedded software , 2006, LCTES '06.

[9]  Franco Stellari,et al.  Optical Diagnostics for IBM POWER6- Microprocessor , 2008, 2008 IEEE International Test Conference.

[10]  Thomas W. Reps,et al.  Improved Memory-Access Analysis for x86 Executables , 2008, CC.

[11]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[12]  Helmut Veith,et al.  Jakstab: A Static Analysis Platform for Binaries , 2008, CAV.

[13]  Thomas W. Reps,et al.  Directed Proof Generation for Machine Code , 2010, CAV.

[14]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[15]  Youssef Hamadi,et al.  Efficiently solving quantified bit-vector formulas , 2010, Formal Methods in Computer Aided Design.

[16]  Dawson R. Engler,et al.  Practical, Low-Effort Equivalence Verification of Real Code , 2011, CAV.

[17]  Christopher L. Conway,et al.  Cvc4 , 2011, CAV.

[18]  David Brumley,et al.  BAP: A Binary Analysis Platform , 2011, CAV.

[19]  Sumit Gulwani,et al.  Synthesis of loop-free programs , 2011, PLDI '11.

[20]  Ankur Taly,et al.  Automated synthesis of symbolic instruction encodings from I/O samples , 2012, PLDI.

[21]  Viktor Kuncak,et al.  Sound compilation of reals , 2013, POPL.

[22]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[23]  Thomas W. Reps,et al.  TSL: A System for Generating Abstract Interpreters and its Application to Machine-Code Analysis , 2013, TOPL.

[24]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[25]  Thomas W. Reps,et al.  Synthesis of machine code from semantics , 2015, PLDI.

[26]  Sriram K. Rajamani,et al.  Efficient synthesis of probabilistic programs , 2015, PLDI.

[27]  Dan Tsafrir,et al.  Virtual CPU validation , 2015, SOSP.

[28]  Peter-Michael Osera,et al.  Type-and-example-directed program synthesis , 2015, PLDI.

[29]  Inria Paris-Rocquencourt,et al.  The CompCert C verified compiler , 2015 .

[30]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.