Exploiting parallelism and structure to accelerate the simulation of chip multi-processors

Simulation is an important means of evaluating new microarchitectures. Current trends toward chip multiprocessors (CMPs) try the ability of designers to develop efficient simulators. CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model. This may be done by either running the simulation on multiple processors or by integrating multiple processors into the simulation to replace simulated processors. Doing so usually requires tedious manual parallelization or re-design to encapsulate processors. This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models. We show that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor. We demonstrate the power of hardware integration by integrating eight hardware PowerPC cores into a CMP model, achieving a speedup of up to 5.82.

[1]  James C. Hoe,et al.  High-level modeling and FPGA prototyping of microprocessors , 2003, FPGA.

[2]  G. Ganapathy,et al.  Hardware emulation for functional verification of K5 , 1996, 33rd Design Automation Conference Proceedings, 1996.

[3]  David I. August,et al.  Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[4]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  Derek Chiou,et al.  FAST: FPGA-based Acceleration of Simulator Timing models , 2005 .

[7]  David I. August,et al.  Rapid Development of a Flexible Validated Processor Model , 2004 .

[8]  Alan D. George,et al.  Parallel simulation of chip-multiprocessor architectures , 2002, TOMC.

[9]  Satish K. Tripathi,et al.  Parallel and distributed simulation of discrete event systems , 1994 .

[10]  David I. August,et al.  Optimizations for a simulator construction system supporting reusable components , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[11]  Sarita V. Adve,et al.  Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[12]  Wang Ho Yu,et al.  Lu decomposition on a multiprocessing system with communications delay , 1984 .

[13]  Shuvra S. Bhattacharyya,et al.  Embedded Multiprocessors: Scheduling and Synchronization , 2000 .

[14]  Edward A. Lee,et al.  Ptolemy: A Framework for Simulating and Prototyping Heterogenous Systems , 2001, Int. J. Comput. Simul..

[15]  Stephen A. Edwards,et al.  The specification and execution of heterogeneous synchronous reactive systems , 1998 .

[16]  Gary Peterson,et al.  UltraSPARC-I , 1995, DAC '95.

[17]  James Gateley UltraSPARC™ -I Emulation , 1995, DAC 1995.

[18]  Takeshi Yoshimura,et al.  A fast hardware/software co-verification method for systern-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication , 2004, Proceedings. 41st Design Automation Conference, 2004..

[19]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..