Hasim: cycle-accurate multicore performance models on fpgas

The goal of this project is to improve computer architecture by accelerating cycle-accurate performance modeling of multicore processors using FPGAs. Contributions include a distributed technique controlling simulation on a highly-parallel substrate, hardware design techniques to reduce development effort, and a specific framework for modeling shared-memory multicore processors paired with realistic On-Chip Networks. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  David M. Brooks,et al.  CPR: Composable performance regression for scalable multiprocessor models , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[2]  Omer Khan,et al.  Darsim: A Parallel Cycle-Level NoC Simulator , 2010 .

[3]  Arvind,et al.  A-Ports: an efficient abstraction for cycle-accurate performance models on FPGAs , 2008, FPGA '08.

[4]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[5]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[6]  Anant Agarwal,et al.  TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire™ Compilation , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.

[7]  A. Gupta,et al.  Parallel distributed-time logic simulation , 1989, IEEE Design & Test of Computers.

[8]  Andrew Birrell,et al.  Implementing Remote procedure calls , 1983, SOSP '83.

[9]  M.M. Denneau The Yorktown Simulation Engine , 1982, 19th Design Automation Conference.

[10]  Ian Page Constructing hardware-software systems from a single description , 1996, J. VLSI Signal Process..

[11]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[12]  David A. Patterson,et al.  RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.

[13]  K. M. Chandy,et al.  Asynchronous Simulation via a Sequence of Parallel Computations , 1981 .

[14]  David A. Wood,et al.  Full-system timing-first simulation , 2002, SIGMETRICS '02.

[15]  Amir Pnueli,et al.  Marked Directed Graphs , 1971, J. Comput. Syst. Sci..

[16]  A. Parashar,et al.  LEAP : A Virtual Platform Architecture for FPGAs , 2010 .

[17]  John Wawrzynek,et al.  RAMP Blue: A Message-Passing Manycore System in FPGAs , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[18]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[19]  Alberto L. Sangiovanni-Vincentelli,et al.  Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[21]  Dam Sunwoo,et al.  The FAST methodology for high-speed SoC/computer simulation , 2007, ICCAD 2007.

[22]  Anant Agarwal,et al.  Logic emulation with virtual wires , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[23]  David I. August,et al.  Exploiting parallelism and structure to accelerate the simulation of chip multi-processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[24]  David A. Patterson,et al.  A case for FAME: FPGA architecture model execution , 2010, ISCA.

[25]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[26]  Christoforos E. Kozyrakis,et al.  RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.

[27]  John Wawrzynek,et al.  BEE2: a high-end reconfigurable computing system , 2005, IEEE Design & Test of Computers.

[28]  Chen Chang,et al.  BEE3: Revitalizing Computer Architecture Research , 2009 .

[29]  Nigel P. Topham,et al.  High Speed CPU Simulation Using LTU Dynamic Binary Translation , 2009, HiPEAC.

[30]  Todd M. Austin,et al.  MASE: a novel infrastructure for detailed microarchitectural modeling , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[31]  Joel Emer,et al.  AWB : The Asim Architect ' s Workbench , 2007 .

[32]  Kunle Olukotun,et al.  A practical FPGA-based framework for novel CMP research , 2007, FPGA '07.

[33]  Greg Gibeling,et al.  The RAMP Architecture & Description Language , 2006 .

[34]  Jihong Kim,et al.  BlueSSD: An Open Platform for Cross-layer Experiments for NAND Flash-based SSDs , 2010 .

[35]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[36]  Arvind,et al.  Modular scheduling of guarded atomic actions , 2004, Proceedings. 41st Design Automation Conference, 2004..

[37]  Arvind,et al.  Bounded Dataflow Networks and Latency-Insensitive circuits , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[38]  Tsuyoshi Isshiki,et al.  Trace-driven workload simulation method for Multiprocessor System-On-Chips , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[39]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[40]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[41]  Arvind,et al.  Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[42]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.