论文信息 - Effects of Nondeterminism in Hardware and Software Simulation with Thread Mapping

Effects of Nondeterminism in Hardware and Software Simulation with Thread Mapping

In this paper, we explore the simulation performance trade-off under the lens of Monte Carlo design space exploration for multi-threaded programs and thread mapping. The vehicle used for this exploration will be a recent study, whose novel Google Page Rank-based thread mapping approach is compared to hundreds of random mappings, as well as a Round-Robin-based thread mapping approach proposed in this paper used in similar comparisons. The modern simulator landscape presents a choice between cycle-accurate but slow, and fast but inaccurate program simulation. We find that the use of a fast, inaccurate multi-threaded simulator, such as Sniper 5.3, suffers from large nondeterminism in the reported performance of the program. We perform cycle-accurate simulation which demonstrates that the static thread mapping approach does provide benefits in reaching near-optimal design points. Furthermore, the runtime of static thread mapping is significantly reduced using a cycle-accurate simulator compared to the full Monte Carlo exploration of mapping design points.

Baris Taskin | Ankit More | Mark Hempstead | Giordano Salvador | Siddharth Nilakantan

[1] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[2] Mark Hempstead,et al. Platform-independent analysis of function-level communication in workloads , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[3] Lieven Eeckhout,et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4] Mahmut T. Kandemir,et al. Dynamic thread and data mapping for NoC based CMPs , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[5] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[6] Philippe Olivier Alexandre Navaux,et al. Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[7] Baris Taskin,et al. Static thread mapping for NoCs via binary instrumentation traces , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[8] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[9] Kirk W. Cameron,et al. Critical path-based thread placement for NUMA systems , 2011, PMBS '11.