LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support

Providing quality-of-service (QoS) for concurrent tasks in many-core architectures is becoming important, especially for real-time applications. QoS support for on-chip shared resources (such as shared cache, bus, and memory controllers)in chip-multiprocessors has been investigated in recent years. Unlike other shared resources, network-on-chip (NoC) does not typically have central arbitration of accesses to the shared resource. Instead, each router shares the responsibility of resource allocation. While such distributed nature benefits the scalable performance of NoC, it also dramatically complicates the problem of providing QoS support for individual flows. Existing approaches to address this problem suffer from various shortcomings such as low network utilization and weak QoS guarantees. In this work, we propose LOFT No architecture which features both high network utilization and strong QoS guarantees. LOFT is based on the combination of two mechanisms: a) locally-synchronized frames (LSF), which is a distributed frame-based scheduling mechanism that provides flexible QoS guarantees to different flows and b)flit-reservation (FRS), which is a flow-control mechanism integrated in LSF that improves network utilization. The experimental results show that LOFT delivers flexible and reliable QoS guarantees while sufficiently utilizes available network capacity to gain high overall throughput.

[1]  Wolf-Dietrich Weber,et al.  A quality-of-service mechanism for interconnection networks in system-on-chips , 2005, Design, Automation and Test in Europe.

[2]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[3]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[4]  O. Mutlu,et al.  Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.

[5]  Jens Sparsø,et al.  A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[6]  Srinivasan Keshav,et al.  Comparison of rate-based service disciplines , 1991, SIGCOMM '91.

[7]  J.H. Kim,et al.  Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8]  Bobby Bodenheimer,et al.  Synthesis and evaluation of linear motion transitions , 2008, TOGS.

[9]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[11]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[12]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Onur Mutlu,et al.  Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  William J. Dally,et al.  Flit-reservation flow control , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[15]  Mahmut T. Kandemir,et al.  A case for integrated processor-cache partitioning in chip multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  Chita R. Das,et al.  Design and analysis of an NoC architecture from performance, reliability and energy perspective , 2005, 2005 Symposium on Architectures for Networking and Communications Systems (ANCS).

[17]  Krste Asanovic,et al.  Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[18]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[19]  Mikko H. Lipasti,et al.  Light speed arbitration and flow control for nanophotonic interconnects , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[21]  Axel Jantsch,et al.  Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[22]  Chita R. Das,et al.  Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[24]  William J. Dally Virtual-Channel Flow Control , 1992, IEEE Trans. Parallel Distributed Syst..

[25]  James E. Smith,et al.  Virtual private caches , 2007, ISCA '07.

[26]  Timothy G. Mattson,et al.  Programming the Intel 80-core network-on-a-chip Terascale Processor , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.