论文信息 - Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

Emerging many-core chip multiprocessors will integrate dozens of small processing cores with an on-chip interconnect consisting of point-to-point links. The interconnect enables the processing cores to not only communicate, but to share common resources such as main memory resources and I/O controllers. In this work, we propose an arbitration scheme to enable equality of service (EoS) in access to a chip’s shared resources. That is, we seek to remove any bias in a core’s access to a shared resource based on its location in the CMP. We propose using probabilistic arbitration combined with distance-based weights to achieve EoS and overcome the limitation of conventional round-robin arbiter. We describe how nonlinear weights need to be used with probabilistic arbiters and propose three different arbitration weight metrics – fixed weight, constantly increasing weight, and variably increasing weight. By only modifying the arbitration of an on-chip router, we do not require any additional buffers or virtual channels and create a simple, low-cost mechanism for achieving EoS. We evaluate our arbitration scheme across a wide range of traffic patterns. In addition to providing EoS, the proposed arbitration has additional benefits which include providing quality-of-service features (such as differentiated service) and providing fairness in terms of both throughput and latency that approaches the global fairness achieved with age-base arbitration – thus, providing a more stable network by achieving high sustained throughput beyond saturation.

[1] Ran Ginosar,et al. QNoC: QoS architecture and design process for network on chip , 2004, J. Syst. Archit..

[2] A. Raghunathan,et al. LOTTERYBUS: a new high-performance communication architecture for system-on-chip designs , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[3] Wolf-Dietrich Weber,et al. A quality-of-service mechanism for interconnection networks in system-on-chips , 2005, Design, Automation and Test in Europe.

[4] William E. Weihl,et al. Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[5] Lixia Zhang,et al. Virtual Clock: A New Traffic Control Algorithm for Packet Switching Networks , 1990, SIGCOMM.

[6] Niraj K. Jha,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[7] William J. Dally,et al. A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[8] Calvin Lin,et al. Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10] Deborah K. Weisser,et al. Age-based packet arbitration in large-radix k-ary n-cubes , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[11] Onur Mutlu,et al. Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12] J.H. Kim,et al. Rotating Combined Queueing (RCQ): Bandwidth and Latency Guarantees in Low-Cost, High-Performance Networks , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[13] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .

[14] A. Kumary,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[15] Sharad Malik,et al. Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[16] Krste Asanovic,et al. Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[17] Kees Goossens,et al. AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[18] Chita R. Das,et al. Aérgia: exploiting packet latency slack in on-chip networks , 2010, ISCA.

[19] Anujan Varma,et al. Design and analysis of frame-based fair queueing: a new traffic scheduling algorithm for packet-switched networks , 1996, SIGMETRICS '96.

[20] Natalie D. Enright Jerger,et al. Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[21] Chita R. Das,et al. MediaWorm: A QoS Capable Router Architecture for Clusters , 2002, IEEE Trans. Parallel Distributed Syst..

[22] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[24] Doug Burger,et al. Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.

[25] Scott Shenker,et al. Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[26] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.