Effects of Nondeterminism in Hardware and Software Simulation with Thread Mapping

In this paper, we explore the simulation performance trade-off under the lens of Monte Carlo design space exploration for multi-threaded programs and thread mapping. The vehicle used for this exploration will be a recent study, whose novel Google Page Rank-based thread mapping approach is compared to hundreds of random mappings, as well as a Round-Robin-based thread mapping approach proposed in this paper used in similar comparisons. The modern simulator landscape presents a choice between cycle-accurate but slow, and fast but inaccurate program simulation. We find that the use of a fast, inaccurate multi-threaded simulator, such as Sniper 5.3, suffers from large nondeterminism in the reported performance of the program. We perform cycle-accurate simulation which demonstrates that the static thread mapping approach does provide benefits in reaching near-optimal design points. Furthermore, the runtime of static thread mapping is significantly reduced using a cycle-accurate simulator compared to the full Monte Carlo exploration of mapping design points.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Mark Hempstead,et al.  Platform-independent analysis of function-level communication in workloads , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[3]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Mahmut T. Kandemir,et al.  Dynamic thread and data mapping for NoC based CMPs , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[5]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[6]  Philippe Olivier Alexandre Navaux,et al.  Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[7]  Baris Taskin,et al.  Static thread mapping for NoCs via binary instrumentation traces , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[8]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[9]  Kirk W. Cameron,et al.  Critical path-based thread placement for NUMA systems , 2011, PMBS '11.