Execution-driven parallel simulation of PGAS applications on heterogeneous tiled architectures

We present a parallel execution-driven simulator for the efficient simulation of heterogeneous tile-based multi-core architectures. Here, the architecture is composed of several tiles connected via a network-on-chip and each tile contains local memory as well as several possibly different types of compute resources. Partitioned Global Address Space (PGAS) is a programming model matching very well the needs for programming of such modern multi-core architectures. In order to provide performance estimations for parallel software and enable architecture design space exploration, fast functional and timing simulation techniques are required. Thus, we present a simulator that meets this requirement by combining a fast direct-execution simulation approach with different parallelization strategies. Here, we propose four novel parallel discrete-event simulation techniques, which map thread-level parallelism within the applications to core-level parallelism on the target architecture and back to thread-level parallelism on the host machine. In order to achieve this, the correct synchronization and activation of the host threads is necessary being the main focus of this paper. Experiments with parallel real-world applications are used to compare the different techniques against each other and demonstrate that 10.4 times faster simulations than a sequential simulation can be achieved on a 12-core Intel Xeon processor.

[1]  Jürgen Teich,et al.  Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation , 2012, Map2MPSoC/SCOPES.

[2]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[3]  Jung Ho Ahn,et al.  How to simulate 1000 cores , 2009, CARN.

[4]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[5]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[7]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[8]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[9]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[10]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[11]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[12]  Hai Jin,et al.  PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling , 2013, TACO.

[13]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[14]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[15]  Jürgen Teich,et al.  Resource-aware programming and simulation of MPSoC architectures through extension of X10 , 2011, SCOPES.