Weighted-Tuple Synchronization for Parallel Architecture Simulators

Simulation is a critical tool for evaluating processor and program performance and behavior in newly proposed computer architectures. When modeling target machines with hundreds or thousands of cores, parallel simulation approaches are an increasingly popular method to reduce the long simulation times inherent in single-threaded simulation. Unfortunately, synchronization forces a tradeoffs between performance and fidelity in these parallel simulators. In this work, we study the link between synchronization violations and architectural metric error in the form of CPI error. Further, we introduce weighted-tuple synchronization, a new distributed synchronization scheme that improves error-delay for parallel simulation. Each core periodically selects a group of synchronization targets, forming a synchronization tuple. The lead core then waits for the other cores to catch up. Selection occurs randomly, but is weighted to favor cores which cause more synchronization violations. With weighted-tuple synchronization and a synchronization interval of 100 cycles, average error delay improves over barrier synchronization by 41% and over random-pair synchronization by 35%.

[1]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[2]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[3]  Kalyan S. Perumalla,et al.  Parallel and Distributed Simulation: Traditional Techniques and Recent Advances , 2006, Proceedings of the 2006 Winter Simulation Conference.

[4]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[5]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[6]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[10]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[11]  Alan D. George,et al.  Parallel simulation of chip-multiprocessor architectures , 2002, TOMC.

[12]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.