Helper function inlining in dynamic binary translation

Dynamic binary translation (DBT) is the cornerstone of many important applications. Yet, it takes a tremendous effort to develop and maintain a real-world DBT system. To mitigate the engineering effort, helper functions are frequently employed during the development of a DBT system. Though helper functions greatly facilitate the DBT development, their adoption incurs substantial performance overhead due to the helper function calls. To solve this problem, this paper presents a novel approach to inline helper functions in DBT systems. The proposed inlining approach addresses several unique technical challenges. As a result, the performance overhead introduced by helper function calls can be reduced, and meanwhile, the benefits of helper functions for DBT development are not lost. We have implemented a prototype based on the proposed inlining approach using a popular DBT system, QEMU. Experimental results on the benchmark programs from the SPEC CPU 2017 benchmark suite show that an average of 1.2x performance speedup can be achieved. Moreover, the translation overhead introduced by inlining helper functions is negligible.

[1]  Michael D. Smith,et al.  Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[2]  Sameer Kulkarni,et al.  Automatic construction of inlining heuristics using machine learning , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[3]  Suresh Jagannathan,et al.  Flow-directed inlining , 1996, PLDI '96.

[4]  Hong Wang,et al.  Harmonia: a transparent, efficient, and harmonious dynamic binary translator targeting the Intel® architecture , 2011, CF '11.

[5]  Zhang Jiang,et al.  DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms , 2020, ICPP.

[6]  Decheng Zuo,et al.  PerfDBT: Efficient Performance Regression Testing of Dynamic Binary Translation , 2020, 2020 IEEE 38th International Conference on Computer Design (ICCD).

[7]  Michael F. P. O'Boyle,et al.  Automatic Tuning of Inlining Heuristics , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[8]  Angela Demke Brown,et al.  Inlining java native calls at runtime , 2005, VEE '05.

[9]  Weihua Zhang,et al.  More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  James E. Smith,et al.  Using dynamic binary translation to fuse dependent instructions , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[11]  Ian Piumarta,et al.  Optimizing direct threaded code by selective inlining , 1998, PLDI 1998.

[12]  Tianshi Chen,et al.  Hermes: A fast cross-ISA binary translator with post-optimization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[13]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[14]  Wang Zhenjiang,et al.  A Pattern Translation Method for Flags in Binary Translation , 2014 .

[15]  Wenwen Wang,et al.  Improving Dynamically-Generated Code Performance on Dynamic Binary Translators , 2018, VEE.

[16]  Wenwen Wang,et al.  Unleashing the Power of Learning: An Enhanced Learning-Based Approach for Dynamic Binary Translation , 2019, USENIX Annual Technical Conference.

[17]  Andrew Ayers,et al.  Aggressive inlining , 1997, PLDI '97.

[18]  Stephen McCamant,et al.  Efficient and scalable cross-ISA virtualization of hardware transactional memory , 2020, CGO.

[19]  Thomas Würthinger,et al.  An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[20]  Luca P. Carloni,et al.  Cross-ISA machine emulation for multicores , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[21]  Dirk Grunwald,et al.  Reducing indirect function call overhead in C++ programs , 1994, POPL '94.

[22]  Angela Demke Brown,et al.  Comprehensive kernel instrumentation via dynamic binary translation , 2012, ASPLOS XVII.

[23]  Guilherme Ottoni,et al.  BOLT: A Practical Binary Optimizer for Data Centers and Beyond , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[24]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[25]  Stephen McCamant,et al.  Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules , 2018, ASPLOS.

[26]  Derek Bruening,et al.  Process-shared and persistent code caches , 2008, VEE '08.

[27]  Stephen McCamant,et al.  Enabling Cross-ISA Offloading for COTS Binaries , 2017, MobiSys.

[28]  Qin Zhao,et al.  Optimizing binary translation of dynamically generated code , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[29]  Harry Wagstaff,et al.  A Retargetable System-level DBT Hypervisor , 2019, USENIX Annual Technical Conference.

[30]  Stephen McCamant,et al.  A General Persistent Code Caching Framework for Dynamic Binary Translation (DBT) , 2016, USENIX Annual Technical Conference.