Weighted-Tuple: Fast and Accurate Synchronization for Parallel Architecture Simulators

Computer architecture research relies on software simulation to evaluate processor performance. Single-threaded simulators have unacceptable simulation times when modeling complex architectures with hundreds of cores. While parallelizing a simulator can improve performance, parallel simulators face the issue of synchronizing threads, which forces them to trade performance for accuracy. We study relaxed synchronization policies for parallel architecture simulators and introduce the weighted-tuple synchronization policy. Weighted-tuple is a distributed synchronization scheme which improves upon existing policies. We evaluate weighted-tuple for two parallel simulator settings: multicore simulation and network-on-chip simulation. For the multicore setting using weighted-tuple synchronization, average simulation time is reduced by <inline-formula><tex-math notation="LaTeX">$8$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="moeng-ieq1-2494589.gif"/></alternatives></inline-formula> percent over barrier synchronization; error is also reduced by <inline-formula><tex-math notation="LaTeX">$28$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="moeng-ieq2-2494589.gif"/></alternatives></inline-formula> percent. For network-on-chip simulation, weighted-tuple synchronization improves simulation speed by <inline-formula><tex-math notation="LaTeX"> $42$</tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="moeng-ieq3-2494589.gif"/></alternatives></inline-formula> percent with an <inline-formula><tex-math notation="LaTeX">$0.3$</tex-math><alternatives> <inline-graphic xlink:type="simple" xlink:href="moeng-ieq4-2494589.gif"/></alternatives></inline-formula> percent error increase compared to the barrier baseline.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Richard M. Fujimoto,et al.  Exploiting temporal uncertainty in parallel and distributed simulations , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).

[4]  Jianwei Chen,et al.  Adaptive and Speculative Slack Simulations of CMPs on CMPs , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[5]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[6]  L. F. Perrone,et al.  PARALLEL AND DISTRIBUTED SIMULATION : TRADITIONAL TECHNIQUES AND RECENT ADVANCES , 2006 .

[7]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[8]  Jouni Ikonen,et al.  Applying a modified Chandy-Misra algorithm to the distributed simulation of a cellular network , 1998, Workshop on Parallel and Distributed Simulation.

[9]  Paolo Faraboschi,et al.  An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[10]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[11]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[12]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[13]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[14]  Philip A. Wilsey,et al.  Unsynchronized parallel discrete event simulation , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[15]  Philip A. Wilsey,et al.  Relaxing causal constraints in PDES , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[16]  Rami G. Melhem,et al.  Weighted-Tuple Synchronization for Parallel Architecture Simulators , 2014, 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems.

[17]  Peter Martini,et al.  Tolerant Synchronization for Distributed Simulations of Interconnected Computer Networks , 1997, Workshop on Parallel and Distributed Simulation.

[18]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[19]  Tao Li,et al.  Wall-clock based synchronization: A parallel simulation technology for cluster systems , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).