Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

Future chip multiprocessors (CMPs) may have hundreds to thousands of threads competing to access shared resources, and will require quality-of-service (QoS) support to improve system utilization. Although there has been significant work in QoS support within resources such as caches and memory controllers, there has been less attention paid to QoS support in the multi-hop on-chip networks that will form an important component in future systems. In this paper we introduce globally-synchronized frames (GSF), a framework for providing guaranteed QoS in on-chip networks in terms of minimum bandwidth and a maximum delay bound. The GSF framework can be easily integrated in a conventional virtual channel (VC) router without significantly increasing the hardware complexity. We rely on a fast barrier network, which is feasible in an on-chip environment, to efficiently implement GSF. Performance guarantees are verified by both analysis and simulation. According to our simulations, all concurrent flows receive their guaranteed minimum share of bandwidth in compliance with a given bandwidth allocation. The average throughput degradation of GSF on a 8times8 mesh network is within 10% compared to the conventional best-effort VC router in most cases.

[1]  J. Turner,et al.  New directions in communications (or which way to the information age?) , 1986, IEEE Communications Magazine.

[2]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[3]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[4]  S. J. Golestani A stop-and-go queueing framework for congestion management , 1990, SIGCOMM 1990.

[5]  Lixia Zhang VirtualClock: a new traffic control algorithm for packet-switched networks , 1991, TOCS.

[6]  Srinivasan Keshav,et al.  Comparison of rate-based service disciplines , 1991, SIGCOMM '91.

[7]  ZhangHui,et al.  Comparison of rate-based service disciplines , 1991 .

[8]  Rene L. Cruz,et al.  A calculus for network delay, Part I: Network elements in isolation , 1991, IEEE Trans. Inf. Theory.

[9]  Moti Yung,et al.  The integrated MetaNet architecture: a switch-based multimedia LAN for parallel computing and real-time traffic , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[10]  J.H. Kim,et al.  Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[11]  Anujan Varma,et al.  Design and analysis of frame-based fair queueing: a new traffic scheduling algorithm for packet-switched networks , 1996, SIGMETRICS '96.

[12]  Mike Galles Spider: a high-speed network interconnect , 1997, IEEE Micro.

[13]  Nick McKeown,et al.  The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[14]  Mithuna Thottethodi,et al.  Self-tuned congestion control for multiprocessor networks , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[15]  Chita R. Das,et al.  MediaWorm: A QoS Capable Router Architecture for Clusters , 2002, IEEE Trans. Parallel Distributed Syst..

[16]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[17]  J. Turner New directions in communications (or which way to the information age?) , 2002, IEEE Communications Magazine.

[18]  Axel Jantsch,et al.  Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[19]  Stephen B. Furber,et al.  An asynchronous on-chip network router with quality-of-service (QoS) support , 2004, IEEE International SOC Conference, 2004. Proceedings..

[20]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[21]  Ran Ginosar,et al.  QNoC: QoS architecture and design process for network on chip , 2004, J. Syst. Archit..

[22]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[23]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[24]  Jens Sparsø,et al.  A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[25]  Wolf-Dietrich Weber,et al.  A quality-of-service mechanism for interconnection networks in system-on-chips , 2005, Design, Automation and Test in Europe.

[26]  Srihari Makineni,et al.  Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[28]  Krste Asanovic,et al.  METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[29]  Guilherme Ottoni,et al.  Global Multi-Threaded Instruction Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[30]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[31]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[32]  Daniel A. Brokenshire,et al.  Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..

[33]  Yan Solihin,et al.  A Framework for Providing Quality of Service in Chip Multi-Processors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[34]  Deborah K. Weisser,et al.  Age-based packet arbitration in large-radix k-ary n-cubes , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[35]  Christoforos E. Kozyrakis,et al.  From chaos to QoS: case studies in CMP resource management , 2007, CARN.

[36]  James E. Smith,et al.  Virtual private caches , 2007, ISCA '07.

[37]  James E. Smith,et al.  Virtual private machines: a resource abstraction for multicore computer systems , 2009 .

[38]  Onur Mutlu,et al.  Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  John Kim,et al.  Low-cost router microarchitecture for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Resource Management in the Tessellation Manycore OS ∗ , 2010 .

[41]  John Kim,et al.  Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.