SimK: A Large-Scale Parallel Simulation Engine

Simulation is an important method to evaluate future computer systems. Currently microprocessor architecture has switched to parallel, but almost all simulators remained at sequential stage, and the advantages brought by multi-core or many-core processors cannot be utilized. This paper presents a parallel simulator engine (SimK) towards the prevalent SMP/CMP platform, aiming at large-scale fine-grained computer system simulation. In this paper, highly efficient synchronization, communication and buffer management policies used in SimK are introduced, and a novel lock-free scheduling mechanism that avoids using any atomic instructions is presented. To deal with the load fluctuation at light load case, a cooperated dynamic task migration scheme is proposed. Based on SimK, we have developed large-scale parallel simulators HppSim and HppNetSim, which simulate a full supercomputer system and its interconnection network respectively. Results show that HppSim and HppNetSim both gain sound speedup with multiple processors, and the best normalized speedup reaches 14.95X on a two-way quad-core server.

[1]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[2]  Kalyan S. Perumalla,et al.  /spl mu/sik - a micro-kernel for parallel/distributed simulation systems , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[3]  Richard M. Fujimoto,et al.  Parallel discrete event simulation , 1990, CACM.

[4]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[5]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Alan D. George,et al.  Parallel simulation of chip-multiprocessor architectures , 2002, TOMC.

[7]  Philip A. Wilsey,et al.  An ultra-large-scale simulation framework , 2002, J. Parallel Distributed Comput..

[8]  Kai Li,et al.  HPP: An Architecture for High Performance and Utility Computing: HPP: An Architecture for High Performance and Utility Computing , 2009 .

[9]  R. Fujimoto,et al.  Buffer management in shared-memory time warp systems , 1995, Proceedings 9th Workshop on Parallel and Distributed Simulation (ACM/IEEE).

[10]  B. Hendrickson The Chaco User � s Guide Version , 2005 .

[11]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[12]  D. G. Maritsas,et al.  Parallel discrete event simulation with SIMULA , 1989, Parallel Comput..

[13]  David I. August,et al.  Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[14]  Sajal K. Das,et al.  Educing Null Messages Overhead through Load Balancing in Conservative Distributed Simulation Systems , 2004, J. Parallel Distributed Comput..

[15]  F. J. Alexander,et al.  AN APPROACH TO EXTREME-SCALE SIMULATION OF NOVEL ARCHITECTURES , 2001 .

[16]  Carl Tropper,et al.  On Process Migration and Load Balancing in Time Warp , 1993, IEEE Trans. Parallel Distributed Syst..

[17]  Nael B. Abu-Ghazaleh,et al.  Time Warp simulation on clumps , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).

[18]  David I. August,et al.  Exploiting parallelism and structure to accelerate the simulation of chip multi-processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[19]  Rajive L. Bagrodia,et al.  MPI-SIM: using parallel simulation to evaluate MPI programs , 1998, 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274).

[20]  Pradip Bose,et al.  Performance Analysis and Its Impact on Design , 1998, Computer.

[21]  Sajal K. Das,et al.  Null Messages Cancellation Through Load Balancing in Distributed Simulations , 1999, Euro-Par.

[22]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[23]  Xiao-Feng Li,et al.  Task-pushing: a Scalable Parallel GC Marking Algorithm without Synchronization Operations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[24]  Sajal K. Das,et al.  Dynamic load balancing strategies for conservative parallel simulations , 1997, Proceedings 11th Workshop on Parallel and Distributed Simulation.

[25]  C.-L. Liu,et al.  Dynamic load balancing in parallel simulation using time warp mechanism , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[26]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[27]  Peter L. Reiher,et al.  A time warp implementation of Sharks World , 1990, 1990 Winter Simulation Conference Proceedings.

[28]  Josep Torrellas,et al.  Augmint - A Multiprocessor Simulation Environment for Intel x86 architectures , 1996 .

[29]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[30]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[31]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[32]  曹政,et al.  SimK: A Large-Scale Parallel Simulation Engine , 2009 .

[33]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA.

[34]  Luis Ceze,et al.  Full Circle: Simulating Linux Clusters on Linux Clusters , 2003 .