Low-cost guaranteed-throughput dual-ring communication infrastructure for heterogeneous MPSoCs

Connection-oriented Guaranteed-Throughput (GT) mesh-based Networks on Chip (NoCs) have been proposed as a replacement for buses in real-time stream processing systems but are currently rarely used as hardware cost tends to be higher than conventional interconnects. Recently an interconnect with a ring topology was introduced as a low-cost alternative for use in medium scale homogeneous Multiple Processor System on Chip (MPSoC) designs. Cost-effective integration of stream processing accelerators would require an extension of this ring interconnect. We present a dual-ring communication infrastructure for heterogeneous MPSoC designs. Data and credits are transferred between tiles using their separate, oppositely directed, rings. The minimum throughput is determined by analysis of a Cyclo-Static Data Flow (CSDF) model for a system with communication between accelerators and processors. The performance benefits and costs are evaluated by integration of our dual ring and an accelerator in a 16 core MPSoC which is mapped on a Virtex6 FPGA. On this MPSoC a real-time PAL video decoder is executed. A performance gain of a factor 3.6 was obtained at an increase in hardware cost of only 8.5%.

[1]  Kees G. W. Goossens,et al.  A TDM NoC supporting QoS, multicast, and fast connection set-up , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  H. T. Kung,et al.  Credit-Based Flow Control for ATM Networks , 1994, SIGCOMM 1994.

[3]  Y. Leblebici,et al.  Providing QoS to connection-less packet-switched NoC by implementing DiffServ functionalities , 2004, 2004 International Symposium on System-on-Chip, 2004. Proceedings..

[4]  Gerard J. M. Smit,et al.  An energy-efficient reconfigurable circuit-switched network-on-chip , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  Luca Benini,et al.  Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6]  Gerard J. M. Smit,et al.  Modelling run-time arbitration by latency-rate servers in dataflow graphs , 2007, SCOPES '07.

[7]  Kees G. W. Goossens,et al.  The aethereal network on chip after ten years: Goals, evolution, lessons, and future , 2010, Design Automation Conference.

[8]  Jean A. Peperstraete,et al.  Cycle-static dataflow , 1996, IEEE Trans. Signal Process..

[9]  Erik Svensson,et al.  SoC BUS : The solution of high communication bandwidth on chip and short TTM , 2002 .

[10]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[11]  Kees G. W. Goossens,et al.  C-HEAP: A Heterogeneous Multi-Processor Architecture Template and Scalable and Flexible Protocol for the Design of Embedded Signal Processing Systems , 2002, Des. Autom. Embed. Syst..

[12]  Michael J. Flynn,et al.  Dataflow supercomputing , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[13]  Jeff Baxter,et al.  Nahalem-EX CPU architecture , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[14]  Francisco J. Cazorla,et al.  On-chip ring network designs for hard-real time systems , 2013, RTNS '13.

[15]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[16]  Gerard J. M. Smit,et al.  Low-cost guaranteed-throughput communication ring for real-time streaming MPSoCs , 2013, 2013 Conference on Design and Architectures for Signal and Image Processing.

[17]  Kees G. W. Goossens,et al.  Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip , 2003, DATE.