Adaptive and Speculative Slack Simulations of CMPs on CMPs

Current trends signal an imminent crisis in the simulation of future CMPs (Chip Multiprocessors). Future micro-architectures will offer more and more thread contexts to execute parallel programs, but the execution speed of each thread will not improve at the same pace. CMPs with 10’s or even100’s of cores are envisioned. Simulating these future CMP sefficiently without compromising accuracy is a challenge. Slack simulation is a general parallel simulation paradigm which provides flexible trade-offs between simulation accuracy and speed. Simulation threads do not synchronize after every target core cycle as in cycle-by-cycle simulation. Rather a maximum slack (the slack bound) is enforced between the clocks of all simulated cores. A slack simulation may become inaccurate because of simulation violations. Such violations occur when a resource is accessed by two cores in different order in the simulation and in the target system. We introduce and demonstrate techniques to detect violations, to adapt the simulation slack to maintain a target violation rate, and to checkpoint and rollback a slack simulation when violations are detected. We show some simulation performance/accuracy data for a set of five Splash benchmarks in the context of an 8-core CMP with a snooping cache coherence protocol simulated on Slack Sim, our universal slack simulation platform.

[1]  Kai Li,et al.  Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..

[2]  Yuval Tamir,et al.  Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[3]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[4]  Naraig Manjikian Multiprocessor enhancements of the SimpleScalar tool set , 2001, CARN.

[5]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[6]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[8]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[9]  Paolo Faraboschi,et al.  An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[10]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[11]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[12]  Jianwei Chen,et al.  Exploiting Simulation Slack to Improve Parallel Simulation Speed , 2009, 2009 International Conference on Parallel Processing.

[13]  Jianwei Chen Parallel simulation of chip-multiprocessor , 2009 .

[14]  David A. Wood,et al.  Accuracy vs. performance in parallel simulation of interconnection networks , 1995, Proceedings of 9th International Parallel Processing Symposium.

[15]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[16]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[17]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[18]  Alan D. George,et al.  Parallel simulation of chip-multiprocessor architectures , 2002, TOMC.

[19]  Michel Dubois,et al.  Soft Error Benchmarking for L2 Cache with PARMA , 2010 .

[20]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[21]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.