The FAST methodology for high-speed SoC/computer simulation

This paper describes the FAST methodology that enables a single FPGA to accelerate the performance of cycle-accurate computer system simulators modeling modern, realistic SoCs, embedded systems and standard desktop/laptop/server computer systems. The methodology partitions a simulator into (i) a functional model that simulates the functionality of the computer system and (ii) a predictive model that predicts performance and other metrics. The partitioning is crafted to map most of the parallel work onto a hardware-based predictive model, eliminating much of the complexity and difficulty of simulating parallel constructs on a sequential platform. FAST conventions and libraries have been designed to make creating, modifying, using and measuring such simulators straightforward. We describe a prototype FAST system: a full-system, RTL-level cycle-accurate-capable computer system simulator that executes the x86 ISA, boots unmodified Linux and executes unmodified x86 applications. The prototype runs two to three orders of magnitude faster than the fastest Intel and AMD RTL-level cycle-accurate x86 software-based simulators and about six to seven times faster than RTL simulation.

[1]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[2]  Joel Emer,et al.  Implementing a Functional / Timing Partitioned Microprocessor Simulator with an FPGA , 2006 .

[3]  Doug Burger,et al.  Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.

[4]  Derek Chiou,et al.  FPGA-based Fast , Cycle-Accurate , Full-System Simulators , 2006 .

[5]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[6]  James R. Larus,et al.  Fast out-of-order processor simulation using memoization , 1998, ASPLOS VIII.

[7]  James C. Hoe,et al.  Full-System Architectural Exploration Sandbox , 2005 .

[8]  David I. August,et al.  Exploiting parallelism and structure to accelerate the simulation of chip multi-processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[9]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[10]  Kunle Olukotun,et al.  ATLAS: A Scalable Emulator for Transactional Parallel Systems , 2005 .

[11]  Taeweon Suh,et al.  Initial Observations of Hardware / Software Co-Simulation using FPGA in Architecture Research , 2006 .

[12]  Christoforos E. Kozyrakis,et al.  RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.

[13]  Shih-Lien Lu,et al.  Memory Subsystem Performance Evaluation with FPGA based Emulators , 2005 .

[14]  Nathan L. Binkert,et al.  Network-Oriented Full-System Simulation using M5 , 2003 .

[15]  Derek Chiou,et al.  FAST: FPGA-based Acceleration of Simulator Timing models , 2005 .

[16]  Yang Guo,et al.  Design and implementation of a parallel Verilog simulator: PVSim , 2004, 17th International Conference on VLSI Design. Proceedings..