Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator

Instruction set simulators (Iss) are vital tools for compiler and processor architecture design space exploration and verification. State-of-the-art simulators using just-in-time (Jit) dynamic binary translation (Dbt) techniques are able to simulate complex embedded processors at speeds above 500 Mips. However, these functional Iss do not provide microarchitectural observability. In contrast, low-level cycle-accurate Iss are too slow to simulate full-scale applications, forcing developers to revert to FPGA-based simulations. In this paper we demonstrate that it is possible to run ultra-high speed cycle-accurate instruction set simulations surpassing FPGA-based simulation speeds. We extend the Jit Dbt engine of our Iss and augment Jit generated code with a verified cycle-accurate processor model. Our approach can model any microarchitectural configuration, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded processor implementing the ARCompact™ instruction set architecture (Isa). We achieve simulation speeds up to 63 Mips on a standard x86 desktop computer, whilst the average cycle-count deviation is less than 1.5% for the industry standard Eembc and Core Mark benchmark suites.

[1]  Peter Marwedel,et al.  Proceedings of the 11th international workshop on Software & compilers for embedded systems , 2007 .

[2]  Wei Qin,et al.  Constructing Portable Compiled Instruction-set Simulators-An ADL-driven Approach , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[3]  Lei Gao,et al.  An Integrated Performance Estimation Approach in a Hybrid Simulation Framework , 2008 .

[4]  Rainer Leupers,et al.  A universal technique for fast and flexible instruction-set architecture simulation , 2002, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Igor Böhm,et al.  Explorer Cycle-Accurate Performance Modelling in an Ultra-Fast Just-InTime Dynamic Binary Translation Instruction Set Simulator , 2011 .

[6]  Anish Muttreja,et al.  Hybrid Simulation for Energy Estimation of Embedded Software , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Jürgen Teich,et al.  Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis , 2007 .

[8]  Xinping Zhu,et al.  A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[9]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[10]  Luciano Lavagno,et al.  Software performance estimation strategies in a system-level design tool , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[11]  Cathy May,et al.  Mimic: a fast system/370 simulator , 1987, SIGPLAN '87.

[12]  Lei Gao,et al.  A fast and generic hybrid simulation approach using C virtual machine , 2007, CASES '07.

[13]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[14]  Daniel D. Gajski,et al.  A retargetable, ultra-fast instruction set simulator , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[15]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[16]  Anish Muttreja,et al.  Hybrid simulation for embedded software energy estimation , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[17]  Nigel P. Topham,et al.  High Speed CPU Simulation Using LTU Dynamic Binary Translation , 2009, HiPEAC.

[18]  Gianluca Bontempi,et al.  A data analysis method for software performance prediction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[19]  Lei Gao,et al.  TotalProf: a fast and accurate retargetable source code profiler , 2009, CODES+ISSS '09.

[20]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[21]  David A. Bader,et al.  BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[22]  Mikko H. Lipasti,et al.  A dynamic binary translation approach to architectural simulation , 2001, CARN.

[23]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[24]  Björn Franke,et al.  Using continuous statistical machine learning to enable high-speed performance prediction in hybrid instruction-/cycle-accurate instruction set simulators , 2009, CODES+ISSS '09.

[25]  Olivier Temam,et al.  UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development , 2007, IEEE Computer Architecture Letters.

[26]  Carsten Gremzow Compiled low-level virtual instruction set simulation and profiling for code partitioning and ASIP-synthesis in hardware/software co-design , 2007, SCSC.

[27]  Eric Cheung,et al.  Framework for fast and accurate performance simulation of multiprocessor systems , 2007, 2007 IEEE International High Level Design Validation and Test Workshop.

[28]  Nikil D. Dutt,et al.  Instruction set compiled simulation: a technique for fast and flexible instruction set simulation , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[29]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[30]  James E. Fowler,et al.  Compiled instruction set simulation , 1991, Softw. Pract. Exp..

[31]  Greg Hamerly,et al.  SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .

[32]  Heinrich Meyr,et al.  Architecture implementation using the machine description language LISA , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[33]  Flávio Rech Wagner,et al.  Accurate software performance estimation using domain classification and neural networks , 2004, Proceedings. SBCCI 2004. 17th Symposium on Integrated Circuits and Systems Design (IEEE Cat. No.04TH8784).

[34]  Björn Franke,et al.  Fast cycle-approximate instruction set simulation , 2008, SCOPES '08.

[35]  Jianwen Zhu,et al.  An ultra-fast instruction set simulator , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[36]  Lei Gao,et al.  HySim: A fast simulation framework for embedded software development , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[37]  Sung Woo Chung,et al.  An Accurate Architectural Simulator for ARM1136 , 2005, EUC.

[38]  Multiprocessor performance estimation using hybrid simulation , 2008, 2008 45th ACM/IEEE Design Automation Conference.