论文信息 - Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks

Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks

This paper introduces packet chaining, a simple and effective method to increase allocator matching efficiency and hence network performance, particularly suited to networks with short packets and short cycle times. Packet chaining operates by chaining packets destined to the same output together, to reuse the switch connection of a departing packet. This allows an allocator to build up an efficient matching over a number of cycles, like incremental allocation, but not limited by packet length. For a 64-node 2D mesh at maximum injection rate and with single-flit packets, packet chaining increases network throughput by 15% compared to a conventional single-iteration separable iSLIP allocator, outperforms a wavefront allocator, and gives comparable throughput with an augmenting paths allocator. Packet chaining achieves this performance with a cycle time comparable to a single-iteration separable allocator. Packet chaining also reduces average network latency by 22.5%. Finally, packet chaining increases IPC up to 46% (16% average) for application benchmarks because short packets are critical in a typical cache-coherent CMP. These are considerable improvements given the maturity of network-on-chip routers and allocators.

Nan Jiang | George Michelogiannakis | William J. Dally | Daniel Becker

[1] Eun Jung Kim,et al. Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[2] Z. Ding,et al. A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3] Marc Snir,et al. The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[4] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .

[5] William J. Dally,et al. Allocator implementations for network-on-chip routers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[6] Federico Silla,et al. A comparative study of arbitration algorithms for the Alpha 21364 pipelined router , 2002, ASPLOS X.

[7] A. Kumary,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[8] Nick McKeown,et al. The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[9] Sriram R. Vangal,et al. A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.