Golden Gate: Bridging The Resource-Efficiency Gap Between ASICs and FPGA Prototypes

We present Golden Gate, an FPGA-based simulation tool that decouples the timing of an FPGA host platform from that of the target RTL design. In contrast to previous work in static time-multiplexing of FPGA resources, Golden Gate employs the Latency-Insensitive Bounded Dataflow Network (LI-BDN) formalism to decompose the simulator into subcomponents, each of which may be independently and automatically optimized. This structure allows Golden Gate to support a broad class of optimizations that improve resource utilization by implementing FPGA-hostile structures over multiple cycles, while the LI-BDN formalism ensures that the simulator still produces bit- and cycle-exact results. To verify that these optimizations are implemented correctly, we also present LIME, a model-checking tool that provides a push-button flow for checking whether optimized subcomponents adhere to an associated correctness specification, while also guaranteeing forward progress. Finally, we use Golden Gate to generate a cycle-exact simulator of a multi-core SoC, where we reduce LUT utilization by up to 26% by coercing multi-ported, combinationally read memories into simulation models backed by time-multiplexed block RAMs, enabling us to simulate 50% more cores on a single FPGA.

[1]  Adam M. Izraelevitz,et al.  The Rocket Chip Generator , 2016 .

[2]  Randy H. Katz,et al.  FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud , 2019, IEEE Micro.

[3]  Donggyu Kim,et al.  Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  David A. Patterson,et al.  RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.

[5]  Hokeun Kim,et al.  Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[6]  Gabriele Saucier,et al.  FPGA-Based Emulation: Industrial and Custom Prototyping Solutions , 2000, FPL.

[7]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[8]  Anant Agarwal,et al.  Logic emulation with virtual wires , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[9]  Doug Amos,et al.  FPGA-Based Prototyping Methodology Manual: Best Practices in Design-For-Prototyping , 2011 .

[10]  Arvind,et al.  Bounded Dataflow Networks and Latency-Insensitive circuits , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[11]  Krste Asanovic,et al.  FASED: FPGA-Accelerated Simulation and Evaluation of DRAM , 2019, FPGA.

[12]  J. Gregory Steffan,et al.  Multi-ported memories for FPGAs via XOR , 2012, FPGA '12.

[13]  Arvind,et al.  A-Port Networks: Preserving the Timed Behavior of Synchronous Systems for Modeling on FPGAs , 2009, TRETS.

[14]  David A. Patterson,et al.  The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor , 2015 .

[15]  Michael Adler,et al.  HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[16]  Jonathan Rose,et al.  A novel and efficient routing architecture for multi-FPGA systems , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[17]  Sanjit A. Seshia,et al.  UCLID5: Integrating Modeling, Verification, Synthesis and Learning , 2018, 2018 16th ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE).

[18]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[19]  Amir Pnueli,et al.  The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[20]  William N. N. Hung,et al.  Challenges in Large FPGA-based Logic Emulation Systems , 2018, ISPD.

[21]  Christoforos E. Kozyrakis,et al.  RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.

[22]  Vaughn Betz,et al.  Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23]  Muralidaran Vijayaraghavan Theory of composable latency-insensitive refinements , 2009 .

[24]  Alberto L. Sangiovanni-Vincentelli,et al.  Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[25]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.