Automated modeling and emulation of interconnect designs for many-core chip multiprocessors

Simulation of new multi- and many-core systems is becoming an increasingly large bottleneck in the design process. This paper presents the ACME design automation tool flow that facilitates the hardware emulation of newly proposed large multi-core interconnection networks on FPGAs to mitigate the slowdowns of single threaded event driven simulation. The tool is aimed at computer and network architects who have knowledge of digital design but may not be comfortable with hardware description languages and synthesis flows. ACME uses a graphical entry that allows a mix of hardware components with software algorithms written in C, each with a user defined latency and throughput in terms of system cycles. ACME automatically generates a cycle accurate hardware emulator as a Xilinx Platform Studio project, which integrates synthesized hardware blocks with embedded soft-core processors that execute the C code. Our results demonstrate that for 16-core and 64-core cycle accurate packet switching networks, the FPGA-based emulation is faster than Simics-based software simulation by 2.5x and 14.6x, respectively.

[1]  John Wawrzynek,et al.  RAMP Blue: A Message-Passing Manycore System in FPGAs , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[2]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[3]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[4]  Babak Falsafi,et al.  PROToFLEX: FPGA-accelerated Hybrid Functional Simulator , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[5]  Donald E. Thomas,et al.  The Verilog® Hardware Description Language , 1990 .

[6]  Edward A. Lee,et al.  Heterogeneous Concurrent Modeling and Design in Java (Volume 1: Introduction to Ptolemy II) , 2008 .

[7]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[8]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[9]  Edward A. Lee,et al.  Taming heterogeneity - the Ptolemy approach , 2003, Proc. IEEE.

[10]  Constantine D. Polychronopoulos,et al.  Fast barrier synchronization hardware , 1990, Proceedings SUPERCOMPUTING '90.

[11]  Hyunjin Lee,et al.  TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation , 2008, 2008 37th International Conference on Parallel Processing.

[12]  Alex K. Jones,et al.  Reducing power while increasing performance with supercisc , 2006, TECS.

[13]  Thorsten Grotker,et al.  System Design with SystemC , 2002 .

[14]  James E. Smith,et al.  The future of simulation: a field of dreams , 2006, Computer.

[15]  Christoforos E. Kozyrakis,et al.  RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.

[16]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[17]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.