HopliteRT: An efficient FPGA NoC for real-time applications

Overlay NoCs, such as Hoplite, are cheap to implement on an FPGA but provide no bounds on worst-case routing latency of packets traversing the NoC due to deflection routing. In this paper, we show how to adapt Hoplite to enable calculation of precise upper bounds on routing latency by modifying the routing function to prioritize deflections, and by regulating the injection of packets to meet certain throughput and burstiness constraints. We provide an analytical model for computing end-to-end latency in the form of (1) in-flight time in the network T<sup>f</sup>, and (2) waiting time at the source node T<sup>s</sup>. To bound in-flight time in an m × m NoC, we modify the routing function and switching crossbar richness in the Hoplite router to deliver T<sup>f</sup> = ΔΧ + ΔΥ + (AY × m) + 2 where AX and ΔΥ are differences of the source and destination address co-ordinates of the packet. To bound the waiting time at the source, we add a Token Bucket regulator with rate ρ<inf>i</inf> and burstiness σ<inf>i</inf> for each flow f<inf>i</inf> of node (x, y) to deliver (⌈1/ρ<inf>i</inf>⌉ − 1) + T<sup>s</sup>: T<sup>s</sup> = ⌈σ(Γ<sup>C</sup><inf>f</inf>)/1−σ(Γ<sup>C</sup><inf>f</inf>);⌉ which depends on the regulator period 1/ρ<inf>i</inf>, burstiness a and the rate ρ of all interfering flows Γ<sup>C</sup><inf>f</inf>. A 64b implementation of our HopliteRT router requires ≈4% fewer LUTs, and similar number of FFs compared to the original Hoplite router. We also need two small counters at each client port for regulating injection. We evaluate our model and RTL implementation across synthetic traffic patterns and observe behavior that conforms with the analytical bounds.

[1]  Jan Gray GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[2]  Francisco J. Cazorla,et al.  Modeling High-Performance Wormhole NoCs for Critical Real-Time Embedded Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[3]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[4]  Jean-Yves Le Boudec,et al.  Network Calculus: A Theory of Deterministic Queuing Systems for the Internet , 2001 .

[5]  Theo Ungerer,et al.  Minimally buffered deflection routing with in-order delivery in a torus , 2017, 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[6]  Chris Fallin,et al.  CHIPPER: A low-complexity bufferless deflection router , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[7]  Alan Burns,et al.  Real-Time Communication Analysis for On-Chip Networks with Wormhole Switching , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[8]  Nachiket Kapre Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[9]  Kees G. W. Goossens,et al.  The aethereal network on chip after ten years: Goals, evolution, lessons, and future , 2010, Design Automation Conference.

[10]  James C. Hoe,et al.  CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs , 2012, FPGA '12.

[11]  Thomas Nolte,et al.  Partitioning and Analysis of the Network-on-Chip on a COTS Many-Core Platform , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[12]  Hiren D. Patel,et al.  Buffer Space Allocation for Real-Time Priority-Aware Networks , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[13]  Zain-ul-Abdin,et al.  Kickstarting high-performance energy-efficient manycore architectures with Epiphany , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[14]  Wolfgang Kellerer,et al.  Network Calculus: A Comprehensive Guide , 2016 .

[15]  Theo Ungerer,et al.  Guaranteed Service Independent of the Task Placement in NoCs with Torus Topology , 2014, RTNS '14.

[16]  Chenyang Lu,et al.  Cyber-physical systems for real-time hybrid structural testing: a case study , 2010, ICCPS '10.

[17]  Nachiket Kapre,et al.  Hoplite: Building austere overlay NoCs for FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[18]  A. Kurdila,et al.  Vision-based control of micro-air-vehicles: progress and problems in estimation , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[19]  André DeHon,et al.  FPGA optimized packet-switched NoC using split and merge primitives , 2012, 2012 International Conference on Field-Programmable Technology.