Transformer: A functional-driven cycle-accurate multicore simulator

Full-system simulators are extremely useful in evaluating design alternatives for multicore. However, state-of-the-art multicore simulators either lack good extensibility due to their tightly-coupled design between functional model (FM) and timing model (TM), or cannot guarantee cycle-accuracy. This paper conducts a comprehensive study on factors affecting cycle-accuracy and uncovers several contributing factors ignored before. Based on the study, we propose a loosely-coupled functional-driven full-system simulator for multicore, namely Transformer. To ensure extensibility and cycle-accuracy, Transformer leverages an architecture-independent interface between FM and TM and uses a lightweight scheme to detect and recover from execution divergence between FM and TM. Based on Transformer, a graduate student only needs to write about 180 lines of code and takes about two months to extend an X86 functional model (QEMU) in Transformer. Moreover, the loosely-coupled design also removes the complex interaction between FM and TM and opens the opportunity to parallelize FM and TM to improve performance. Experimental results show that Transformer achieves an average of 8.4% speedup over GEMS while guaranteeing the cycle-accuracy. A further parallelization between FM and TM leads to 35.3% speedup.

[1]  Haibo Chen,et al.  COREMU: a scalable and portable parallel full-system emulator , 2011, PPoPP '11.

[2]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[3]  Michael Adler,et al.  HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[5]  Hui Zeng,et al.  MPTLsim: A simulator for X86 multicore processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[6]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[7]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Jianwei Chen,et al.  Adaptive and Speculative Slack Simulations of CMPs on CMPs , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  David A. Patterson,et al.  RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.

[11]  David A. Wood,et al.  Full-system timing-first simulation , 2002, SIGMETRICS '02.

[12]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[13]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[16]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[17]  Dam Sunwoo,et al.  Accurate Functional-First Multicore Simulators , 2009, IEEE Computer Architecture Letters.

[18]  David A. Patterson,et al.  A case for FAME: FPGA architecture model execution , 2010, ISCA.