HSAemu - A full system emulator for HSA platforms

Heterogeneous System Architecture (HSA) is an open industry standard designed to support a large variety of data-parallel and task-parallel programming models. Currently, most of HSA hardware and software components are still in development. It is helpful to provide various heterogeneous simulation environments for HSA developers in developing HSA software stacks. This paper presents the design of HSAemu, a full system emulator for the HSA platform, and illustrates how those HSA features are implemented in the simulator. HSAemu provides an infrastructure of heterogeneous simulation environments by supporting required HSA features, including hUMA, hQ and HSAIL. Based on the infrastructure, HSAemu provide two simulation models, FastSim and DeepSim, for high-speed functional emulation and slow cycle-accurate simulation, respectively. In our preliminary experiments, HSAemu helps test a complete HSA software stack and profile system performance. Our case studies show that HSAemu is very useful as a hardware/software co-design tool for heterogeneous systems.

[1]  Wuu Yang,et al.  LLBT: an LLVM-based static binary translator , 2012, CASES '12.

[2]  Olivier Temam,et al.  UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development , 2007, IEEE Computer Architecture Letters.

[3]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[4]  Sudhakar Yalamanchili,et al.  Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[6]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[7]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[8]  Pedro López,et al.  Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[9]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[10]  Julio Sahuquillo,et al.  Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors , 2007 .

[11]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[12]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[13]  Yeh-Ching Chung,et al.  PQEMU: A Parallel System Emulator Based on QEMU , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[14]  Yu Zhang,et al.  Parallelization of IBM mambo system simulator in functional modes , 2008, OPSR.

[15]  Haibo Chen,et al.  COREMU: a scalable and portable parallel full-system emulator , 2011, PPoPP '11.

[16]  David Defour,et al.  Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[17]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[18]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[19]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  Lixin Zhang,et al.  Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[21]  Robert E. Lantz Fast Functional Simulation with Parallel Embra , 2008 .

[22]  H. Liu,et al.  Conference on Measurement and modeling of computer systems , 2001 .

[23]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[24]  Hui Zeng,et al.  MPTLsim: a cycle-accurate, full-system simulator for x86-64 multicore architectures with coherent caches , 2009, CARN.

[25]  Yeh-Ching Chung,et al.  A hybrid just-in-time compiler for android: comparing JIT types and the result of cooperation , 2012, CASES '12.

[26]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[27]  Tei-Wei Kuo,et al.  A real-time, energy-efficient system software suite for heterogeneous multicore platforms , 2012, CODES+ISSS.

[28]  Chien-Min Wang,et al.  HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores , 2012, CGO '12.

[29]  Andreas Moshovos,et al.  Characterizing the performance benefits of fused CPU/GPU systems using FusionSim , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[30]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).