Hoplite-DSP: Harnessing the Xilinx DSP48 multiplexers to efficiently support NoCs on FPGAs

We can embed the crossbar functionality of NoC (network-on-chip) routers onto the hard multiplexers of Xilinx DSP48E primitives to support resource efficient mapping of FPGA overlay NoCs. This embedding also permits the use of dedicated hard wiring resources of the DSP cascade links to support vertical NoC channels. This unique mapping allows us to significantly reduce soft logic (LUTs+FFs) utilization of FPGA overlay NoCs at the expense of DSP resources while also lowering the routing requirements on configurable FPGA interconnect. This embedding is made possible by the dynamic mode control feature of the DSP blocks that allows per-cycle modification of ALU operation and multiplexer data steering controls within the block. We multi-pump the DSP block by internally operating at 600-650 MHz speeds while delivering fabric-facing frequencies of 300-325 MHz. For 48b-wide chip-spanning 32×16 NoC mapped onto an XV7V485T (VC707 board), a LUT-only implementation of the Hoplite router requires ≈70 LUT+140 FFs@2.7 ns instead of 1 DSP48 block+≈13 LUTs+17 FFs@2.8 ns on average. For 15% toggle rates, across most system sizes, the DSP-based NoC exploiting hard resources requires 1.1-2× lower power than the LUT-based NoC. Across a range of statistical workloads, we are able to match the performance of LUT-only Hoplite delivering a sustained rate as high as 8-10% for injection rate of 100% for LOCAL traffic pattern when mapped to a 16×16 NoC. In previous work, a conventional hard NoC router with virtual channels, and FIFO buffers has been demonstrated to be 20-23× smaller, 5-6× faster, and up to 14× lower power than equivalent soft NoC routers. Our DSP-based Hoplite soft NoC router requires practically identical silicon area, runs only 3× slower, and consumes 43% less power than the conventional hard NoC router, while sacrificing certain communication properties in favor of a lean implementation.

[1]  Jason Helge Anderson,et al.  Multi-pumping for resource reduction in FPGA high-level synthesis , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Vaughn Betz,et al.  Networks-on-Chip for FPGAs: Hard, Soft or Mixed? , 2014, TRETS.

[3]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Nachiket Kapre,et al.  Hoplite: Building austere overlay NoCs for FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[5]  Vaughn Betz,et al.  The power of communication: Energy-efficient NOCS for FPGAS , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[6]  Douglas L. Maskell,et al.  iDEA: A DSP block based FPGA soft processor , 2012, 2012 International Conference on Field-Programmable Technology.

[7]  Vaughn Betz,et al.  The Case for Embedded Networks on Chip on Field-Programmable Gate Arrays , 2014, IEEE Micro.

[8]  J. Gregory Steffan,et al.  Efficient multi-ported memories for FPGAs , 2010, FPGA '10.