Wall-clock based synchronization: A parallel simulation technology for cluster systems

A common practice for reducing synchronization overheads in parallel simulation of a large-scale cluster is to relax synchronization with lengthened synchronous steps. However, as a side effect, simulation accuracy degrades considerably. This paper proposes a novel mechanism that keeps the running speeds of different nodes consistent by synchronizing logical clocks with the wall clock periodically within each lax step. Because speed deviations of nodes are the main source of time causality errors, through aligning speeds our mechanism only causes modest precision loss while achieving a close performance to lax synchronization. The experimental results show that it improves the performance by 2 to 11 times relative to the baseline barrier synchronization with a high accuracy (e.g. 99% in most cases). Compared to the recently proposed adaptive mechanism, it also achieves nearly 30% performance improvement.

[1]  R. Fujimoto Parallel and distributed simulation , 1995, Winter Simulation Conference Proceedings, 1995..

[2]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[3]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[4]  David A. Wood,et al.  Full-system timing-first simulation , 2002, SIGMETRICS '02.

[5]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[6]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[7]  Rolf Riesen,et al.  Instruction-level simulation of a cluster at scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[9]  Haibo Chen,et al.  COREMU: a scalable and portable parallel full-system emulator , 2011, PPoPP '11.

[10]  Kun Zhang,et al.  ArchSim: A System-Level Parallel Simulation Platform for the Architecture Design of High Performance Computer , 2009, Journal of Computer Science and Technology.

[11]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[12]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[13]  Paolo Faraboschi,et al.  An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[14]  Ian T. Foster,et al.  GangSim: a simulator for grid scheduling studies , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[15]  Srinivasan Seshan,et al.  2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference , 2007 .

[16]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[17]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[18]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[19]  K ReinhardtSteven,et al.  The M5 Simulator , 2006 .

[20]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[21]  R.M. Fujimoto,et al.  Parallel and distributed simulation systems , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[22]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[23]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[24]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[25]  曹政,et al.  SimK: A Large-Scale Parallel Simulation Engine , 2009 .

[26]  Lixin Zhang,et al.  Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[27]  Huang Ke-di,et al.  Parallel implementation of multi-resolution models in distributed simulation system , 2007 .

[28]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[29]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.