Optimizing Memory Access Performance Using Hardware Assisted Virtualization in Retargetable Dynamic Binary Translation

Dynamic Binary Translation is one of the most efficient strategies for the simulation of System-on-Chips, with recent studies showing that a large part of the simulation time is spent in realizing memory accesses. Indeed, the simulation of each load and store instructions requires a software emulation of the hardware Memory Management Unit (MMU). In this work, we propose to realize memory accesses in hardware, taking advantage of the hardware-assisted virtualization capabilities that are now available in modern processors. To do so, we have to setup and maintain shadow page tables, like any regular hypervisor would do, running the entire simulator on a virtual CPU. Now, each load and store instructions can be translated to just a couple of load and store instruction, executing at regular speed, therefore avoiding entirely the overhead of the software emulation of hardware MMU. The goal of this paper is to explain how it can be done. To demonstrate our idea, we have implemented our approach in the QEMU retargetable DBT engine, speeding up the simulation by as much as 40%.

[1]  Bin Li,et al.  HSPT: Practical Implementation and Efficient Management of Embedded Shadow Page Tables for Cross-ISA System Virtual Machines , 2015, VEE.

[2]  Jordi Carrabina,et al.  Mixed simulation kernels for high performance virtual platforms , 2009, 2009 Forum on Specification & Design Languages (FDL).

[3]  Christoforos E. Kozyrakis,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 335 Dune: Safe User-level Access to Privileged Cpu Features , 2022 .

[4]  Qin Zhao,et al.  Optimizing binary translation of dynamically generated code , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[5]  Cristina Cifuentes,et al.  Machine-adaptable dynamic binary translation , 2000, Dynamo.

[6]  E. Duesterwald,et al.  Software profiling for hot path prediction: less is more , 2000, SIGP.

[7]  Wei-Chung Hsu,et al.  Efficient memory virtualization for Cross-ISA system mode emulation , 2014, VEE '14.

[8]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[9]  David Larson,et al.  Advanced virtualization capabilities of POWER5 systems , 2005, IBM J. Res. Dev..

[10]  Gil Neiger,et al.  Intel virtualization technology , 2005, Computer.

[11]  Frédéric Pétrot,et al.  Fast simulation of systems embedding VLIW processors , 2012, CODES+ISSS.

[12]  Xin Tong,et al.  Optimizing Memory Translation Emulation in Full System Emulators , 2015, ACM Trans. Archit. Code Optim..

[13]  E. Altman,et al.  Full System Binary Translation : RISC to VLIW , 2000 .

[14]  Frédéric Pétrot,et al.  Using binary translation in event driven simulation for fast and flexible MPSoC simulation , 2009, CODES+ISSS '09.

[15]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[16]  Björn Franke,et al.  Hardware-Accelerated Cross-Architecture Full-System Virtualization , 2016, ACM Trans. Archit. Code Optim..

[17]  Frédéric Pétrot,et al.  On MPSoC Software Execution at the Transaction Level , 2011, IEEE Design & Test of Computers.