论文信息 - Golden Gate: Bridging The Resource-Efficiency Gap Between ASICs and FPGA Prototypes

Golden Gate: Bridging The Resource-Efficiency Gap Between ASICs and FPGA Prototypes

We present Golden Gate, an FPGA-based simulation tool that decouples the timing of an FPGA host platform from that of the target RTL design. In contrast to previous work in static time-multiplexing of FPGA resources, Golden Gate employs the Latency-Insensitive Bounded Dataflow Network (LI-BDN) formalism to decompose the simulator into subcomponents, each of which may be independently and automatically optimized. This structure allows Golden Gate to support a broad class of optimizations that improve resource utilization by implementing FPGA-hostile structures over multiple cycles, while the LI-BDN formalism ensures that the simulator still produces bit- and cycle-exact results. To verify that these optimizations are implemented correctly, we also present LIME, a model-checking tool that provides a push-button flow for checking whether optimized subcomponents adhere to an associated correctness specification, while also guaranteeing forward progress. Finally, we use Golden Gate to generate a cycle-exact simulator of a multi-core SoC, where we reduce LUT utilization by up to 26% by coercing multi-ported, combinationally read memories into simulation models backed by time-multiplexed block RAMs, enabling us to simulate 50% more cores on a single FPGA.

[1] Adam M. Izraelevitz,et al. The Rocket Chip Generator , 2016 .

[2] Randy H. Katz,et al. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud , 2019, IEEE Micro.

[3] Donggyu Kim,et al. Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4] David A. Patterson,et al. RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.

[5] Hokeun Kim,et al. Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[6] Gabriele Saucier,et al. FPGA-Based Emulation: Industrial and Custom Prototyping Solutions , 2000, FPL.

[7] E.A. Lee,et al. Synchronous data flow , 1987, Proceedings of the IEEE.

[8] Anant Agarwal,et al. Logic emulation with virtual wires , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[9] Doug Amos,et al. FPGA-Based Prototyping Methodology Manual: Best Practices in Design-For-Prototyping , 2011 .

[10] Arvind,et al. Bounded Dataflow Networks and Latency-Insensitive circuits , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[11] Krste Asanovic,et al. FASED: FPGA-Accelerated Simulation and Evaluation of DRAM , 2019, FPGA.

[12] J. Gregory Steffan,et al. Multi-ported memories for FPGAs via XOR , 2012, FPGA '12.

[13] Arvind,et al. A-Port Networks: Preserving the Timed Behavior of Synchronous Systems for Modeling on FPGAs , 2009, TRETS.

[14] David A. Patterson,et al. The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor , 2015 .

[15] Michael Adler,et al. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[16] Jonathan Rose,et al. A novel and efficient routing architecture for multi-FPGA systems , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[17] Sanjit A. Seshia,et al. UCLID5: Integrating Modeling, Verification, Synthesis and Learning , 2018, 2018 16th ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE).

[18] Scott Hauck,et al. Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[19] Amir Pnueli,et al. The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[20] William N. N. Hung,et al. Challenges in Large FPGA-based Logic Emulation Systems , 2018, ISPD.

[21] Christoforos E. Kozyrakis,et al. RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.

[22] Vaughn Betz,et al. Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23] Muralidaran Vijayaraghavan. Theory of composable latency-insensitive refinements , 2009 .

[24] Alberto L. Sangiovanni-Vincentelli,et al. Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[25] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.