Fast and cycle-accurate modeling of a multicore processor

An ideal simulator allows an architect to swiftly explore design alternatives and accurately determine their impact on performance. Design exploration requires simulators to be easily modifiable, and accurate performance estimates require detailed models. Unfortunately, detailed modeling not only impacts the ease with which a simulator can be modified, but also the speed at which it can be executed, resulting in fidelity being traded for simulation speed. Although FPGA-based simulators have dramatically higher speed than software simulators, sacrificing fidelity is still common. In this paper we present Arete, an FPGA-based processor simulator, which offers high performance along with accuracy and modifiability. We begin with a cycle-level specification of a multicore architecture which includes realistic in-order cores and detailed models of shared, coherent memory and on-chip network. We then describe how this specification is implemented faithfully and efficiently on FPGAs. Arete delivers a performance of up to 11 MIPS per core. We run a subset of the PARSEC benchmark suite on top of off-the-shelf SMP Linux, and achieve an average performance of 55 MIPS for an 8-core model.We also describe two significant architectural explorations: one involving three different branch predictors and the other requiring major modifications to the cache-coherence protocol.

[1]  D. Culler,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[2]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[3]  Nicholas Wells,et al.  BusyBox: A Swiss Army Knife for Linux , 2000 .

[4]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[5]  Christopher J. Hughes,et al.  RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.

[6]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[7]  Rishiyur S. Nikhil,et al.  Bluespec System Verilog: efficient, correct RTL from high level specifications , 2004, Proceedings. Second ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2004. MEMOCODE '04..

[8]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[9]  Luca Benini,et al.  MPARM: Exploring the Multi-Processor SoC Design Space with SystemC , 2005, J. VLSI Signal Process..

[10]  Mario Diaz-Nava,et al.  An open platform for developing multiprocessor SoCs , 2005, Computer.

[11]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[12]  Luca Benini,et al.  A Complete Multi-Processor System-on-Chip FPGA-Based Emulation Framework , 2006, 2006 IFIP International Conference on Very Large Scale Integration.

[13]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[14]  W. H. Reinhart,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[15]  John Wawrzynek,et al.  RAMP Blue: A Message-Passing Manycore System in FPGAs , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[16]  Arvind,et al.  Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[17]  Arvind,et al.  A-Port Networks: Preserving the Timed Behavior of Synchronous Systems for Modeling on FPGAs , 2009, TRETS.

[18]  Babak Falsafi,et al.  ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs , 2009, TRETS.

[19]  Dam Sunwoo,et al.  Accurate Functional-First Multicore Simulators , 2009, IEEE Computer Architecture Letters.

[20]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[21]  Arvind,et al.  Bounded Dataflow Networks and Latency-Insensitive circuits , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[22]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[23]  Chen Chang,et al.  BEE3: Revitalizing Computer Architecture Research , 2009 .

[24]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[25]  David A. Patterson,et al.  RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.

[26]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[27]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[28]  Srinivas Devadas,et al.  Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[29]  Michael Adler,et al.  HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[30]  Mateo Valero,et al.  From Plasma to BeeFarm: Design Experience of an FPGA-Based Multicore Prototype , 2011, ARC.