Explorer Cycle-Accurate Performance Modelling in an Ultra-Fast Just-InTime Dynamic Binary Translation Instruction Set Simulator

Instruction set simulators (ISS) are vital tools for compiler and processor architecture design space exploration and verification. State-of-the-art simulators using just-in-time (JIT) dynamic binary translation (DBT) techniques are able to simulate complex embedded processors at speeds above 500 MIPS. However, these functional ISS do not provide microarchitectural observability. In contrast, low-level cycle-accurate ISS are too slow to simulate full-scale applications, forcing developers to revert to FPGA-based simulations. In this paper we demonstrate that it is possible to run ultra-high speed cycle-accurate instruction set simulations surpassing FPGA-based simulation speeds. We extend the JIT DBT engine of our ISS and augment JIT generated code with a verified cycle-accurate processor model. Our approach can model any microarchitectural configuration, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded processor implementing the ARCompact TM instruction set architecture (ISA). We achieve simulation speeds up to 88 MIPS on a standard x86 desktop computer for the industry standard EEMBC, COREMARK and BIOPERF benchmark suites.

[1]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[2]  Andreas Krall,et al.  Fast and Accurate Simulation using the LLVM Compiler Framework ? , 2008 .

[3]  Lei Gao,et al.  HySim: A fast simulation framework for embedded software development , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[5]  Lei Gao,et al.  A fast and generic hybrid simulation approach using C virtual machine , 2007, CASES '07.

[6]  James E. Fowler,et al.  Compiled instruction set simulation , 1991, Softw. Pract. Exp..

[7]  G. Braun,et al.  A universal technique for fast and flexible instruction-set architecture simulation , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[8]  Heinrich Meyr,et al.  Architecture implementation using the machine description language LISA , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[9]  Nigel Topham,et al.  High Speed CPU Simulation using JIT Binary Translation , 2015 .

[10]  Björn Franke,et al.  Fast cycle-approximate instruction set simulation , 2008, SCOPES '08.

[11]  Nikil D. Dutt,et al.  Instruction set compiled simulation: a technique for fast and flexible instruction set simulation , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[12]  Multiprocessor performance estimation using hybrid simulation , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[13]  Flávio Rech Wagner,et al.  Accurate software performance estimation using domain classification and neural networks , 2004, Proceedings. SBCCI 2004. 17th Symposium on Integrated Circuits and Systems Design (IEEE Cat. No.04TH8784).

[14]  Lei Gao,et al.  An Integrated Performance Estimation Approach in a Hybrid Simulation Framework , 2008 .

[15]  Mikko H. Lipasti,et al.  A dynamic binary translation approach to architectural simulation , 2001, CARN.

[16]  Xinping Zhu,et al.  A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[17]  Olivier Temam,et al.  UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development , 2007, IEEE Computer Architecture Letters.

[18]  Björn Franke,et al.  Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators , 2009, CODES+ISSS '09.

[19]  Gianluca Bontempi,et al.  A data analysis method for software performance prediction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[20]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[21]  David A. Bader,et al.  BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[22]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[23]  C. May Mimic: a fast system/370 simulator , 1987, PLDI 1987.

[24]  Igor Böhm,et al.  Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[25]  Greg Hamerly,et al.  SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .

[26]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[27]  Anish Muttreja,et al.  Hybrid simulation for embedded software energy estimation , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[28]  Anish Muttreja,et al.  Hybrid Simulation for Energy Estimation of Embedded Software , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  Luciano Lavagno,et al.  Software performance estimation strategies in a system-level design tool , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[30]  Nigel P. Topham,et al.  High Speed CPU Simulation Using LTU Dynamic Binary Translation , 2009, HiPEAC.