Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

Emerging many-core chip multiprocessors will integrate dozens of small processing cores with an on-chip interconnect consisting of point-to-point links. The interconnect enables the processing cores to not only communicate, but to share common resources such as main memory resources and I/O controllers. In this work, we propose an arbitration scheme to enable equality of service (EoS) in access to a chip’s shared resources. That is, we seek to remove any bias in a core’s access to a shared resource based on its location in the CMP. We propose using probabilistic arbitration combined with distance-based weights to achieve EoS and overcome the limitation of conventional round-robin arbiter. We describe how nonlinear weights need to be used with probabilistic arbiters and propose three different arbitration weight metrics – fixed weight, constantly increasing weight, and variably increasing weight. By only modifying the arbitration of an on-chip router, we do not require any additional buffers or virtual channels and create a simple, low-cost mechanism for achieving EoS. We evaluate our arbitration scheme across a wide range of traffic patterns. In addition to providing EoS, the proposed arbitration has additional benefits which include providing quality-of-service features (such as differentiated service) and providing fairness in terms of both throughput and latency that approaches the global fairness achieved with age-base arbitration – thus, providing a more stable network by achieving high sustained throughput beyond saturation.

[1]  Ran Ginosar,et al.  QNoC: QoS architecture and design process for network on chip , 2004, J. Syst. Archit..

[2]  A. Raghunathan,et al.  LOTTERYBUS: a new high-performance communication architecture for system-on-chip designs , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[3]  Wolf-Dietrich Weber,et al.  A quality-of-service mechanism for interconnection networks in system-on-chips , 2005, Design, Automation and Test in Europe.

[4]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[5]  Lixia Zhang,et al.  Virtual Clock: A New Traffic Control Algorithm for Packet Switching Networks , 1990, SIGCOMM.

[6]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[7]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[8]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10]  Deborah K. Weisser,et al.  Age-based packet arbitration in large-radix k-ary n-cubes , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[11]  Onur Mutlu,et al.  Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  J.H. Kim,et al.  Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[13]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[14]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[15]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[16]  Krste Asanovic,et al.  Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[17]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[18]  Chita R. Das,et al.  Aérgia: exploiting packet latency slack in on-chip networks , 2010, ISCA.

[19]  Anujan Varma,et al.  Design and analysis of frame-based fair queueing: a new traffic scheduling algorithm for packet-switched networks , 1996, SIGMETRICS '96.

[20]  Natalie D. Enright Jerger,et al.  Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[21]  Chita R. Das,et al.  MediaWorm: A QoS Capable Router Architecture for Clusters , 2002, IEEE Trans. Parallel Distributed Syst..

[22]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[24]  Doug Burger,et al.  Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.

[25]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[26]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.