Guaranteed Services of the NoC of a Manycore Processor

The Kalray MPPA®-256 processor (Multi-Purpose Processing Array) integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These cores are distributed across 16 compute clusters and 4 I/O subsystems. On-chip communications and synchronization are supported by an explicitly routed dual data & control network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem, for a total of 32 nodes. The data NoC is dedicated to streaming data transfers and may operate with guaranteed services, thanks to non-blocking routers and flow regulation at the source node. Its architecture has been designed so that (σ, ρ) network calculus applies with minimal approximations. Given a set of flows across this data NoC with predetermined routes, we formulate the problem of guaranteeing fair allocation of bandwidth across flows and we present bounds on the maximum transfer latency. By considering the architecture of the data NoC and by introducing conservative approximations, we show how this formulation can be transformed into a linear program. Solving this linear program is efficient and the quality of its solutions appears comparable to those of the original formulation, based on problem instances obtained from the cyclostatic dataflow compilation toolchain of the Kalray MPPA®-256 processor.

[1]  Benoît Dupont de Dinechin,et al.  Time-critical computing on a single-chip massively parallel processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Axel Jantsch,et al.  Flow regulation for on-chip communication , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[3]  Xiaola Lin,et al.  Injection Level Flow Control for Networks-on-Chip (NoC) , 2011, J. Inf. Sci. Eng..

[4]  Axel Jantsch,et al.  Optimal regulation of traffic flows in networks-on-chip , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[5]  Luciano Lenzini,et al.  Tight end-to-end per-flow delay bounds in FIFO multiplexing sink-tree networks , 2006, Perform. Evaluation.

[6]  Benoît Dupont de Dinechin,et al.  A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  Natalie D. Enright Jerger,et al.  Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Hui Zhang,et al.  Service disciplines for guaranteed performance service in packet-switching networks , 1995, Proc. IEEE.

[9]  Diederik Verkest,et al.  Exploiting the Expressiveness of Cyclo-Static Dataflow to Model Multimedia Implementations , 2007, EURASIP J. Adv. Signal Process..

[10]  Deep Medhi,et al.  Routing, flow, and capacity design in communication and computer networks , 2004 .

[11]  Radu Marculescu,et al.  Prediction-based flow control for network-on-chip traffic , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[12]  Rene L. Cruz,et al.  A calculus for network delay, Part I: Network elements in isolation , 1991, IEEE Trans. Inf. Theory.

[13]  Jean-Yves Le Boudec,et al.  Network Calculus: A Theory of Deterministic Queuing Systems for the Internet , 2001 .

[14]  R. Karp,et al.  Properties of a model for parallel computations: determinacy , 1966 .

[15]  Frank Kelly,et al.  Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..

[16]  Andrzej Jajszczyk,et al.  Routing, Flow, and Capacity Design in Communication and Computer Networks - [Book Review] , 2005, IEEE Communications Magazine.

[17]  Ahmad Khonsari,et al.  Max-Min-Fair Best Effort Flow Control in Network-on-Chip Architectures , 2008, ICCS.