SlackSim: a platform for parallel simulations of CMPs on CMPs

The fast simulation of chip multiprocessors (CMPs) presents a critical challenge to the architecture research community as both industry and academia shift their research focus to multicore design. Parallel simulation is a technique to accelerate microarchitecture simulation of CMPs by exploiting the inherent parallelism of CMPs. In this paper, we explore the simulation paradigm of simulating each core of a target CMP in one thread and then spreading the threads across the hardware thread contexts of a host CMP. We implement several parallel simulation schemes using POSIX Threads (Pthreads). We start with cycle-by-cycle simulation and then relax the synchronization condition in various schemes, which we call slack simulations. In slack simulations, the Pthreads simulating different simulated cores do not synchronize after each simulated cycle, but rather they are given some slack. The slack is the difference in cycle between the simulated times of any two target cores. Small slacks, such as a few cycles, greatly improve the efficiency of parallel CMP simulations, with no or negligible simulation error. We have developed a simulation framework called SlackSim to experiment with various slack simulation schemes. Unlike previous attempts to parallelize multiprocessor simulations on distributed memory machines, SlackSim takes advantage of the efficient sharing of data in the host CMP architecture. We demonstrate the efficiency and accuracy of some well known slack simulation schemes and of some new ones on SlackSim running on a state-of-the-art CMP platform.

[1]  Per Stenström,et al.  An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[2]  Alan D. George,et al.  Parallel simulation of chip-multiprocessor architectures , 2002, TOMC.

[3]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.

[4]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[5]  Phil Hontalas,et al.  Distributed Simulation and the Time Wrap Operating System. , 1987, SOSP 1987.

[6]  Naraig Manjikian,et al.  Parallel simulation of multiprocessor execution: implementation and results for simplescalar , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[7]  Paolo Faraboschi,et al.  An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[10]  Ronald C. de Vries,et al.  Reducing Null Messages in Misra's Distributed Discrete Event Simulation Method , 1990, IEEE Trans. Software Eng..

[11]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[12]  Peter E. Strazdins,et al.  A Comparison of Two Approaches to Parallel Simulation of Multiprocessors , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[13]  Margaret Martonosi,et al.  An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation , 2006, IEEE Computer Architecture Letters.

[14]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[15]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[16]  ChidesterMatthew,et al.  Parallel simulation of chip-multiprocessor architectures , 2002 .

[17]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.