Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip

Chip Multi-processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Networks-on-Chip (NOCs) provide a scalable communication method for CMP architectures. NOCs must be carefully designed to meet constraints of power consumption and area, and provide ultra low latencies. Existing NOCs mostly use Dimension Order Routing (DOR) to determine the route taken by a packet in unicast traffic. However, with the development of diverse applications in CMPs, one-to-many (multicast) and one-to-all (broadcast) traffic are becoming more common. Current unicast routing cannot support multicast and broadcast traffic efficiently. In this paper, we propose Recursive Partitioning Multicast (RPM) routing and a detailed multicast wormhole router design for NOCs. RPM allows routers to select intermediate replication nodes based on the global distribution of destination nodes. This provides more path diversities, thus achieves more bandwidth-efficiency and finally improves the performance of the whole network. Our simulation results using a detailed cycle-accurate simulator show that compared with the most recent multicast scheme, RPM saves 25% of crossbar and link power, and 33% of link utilization with 50% network performance improvement. Also RPM is more scalable to large networks than the recently proposed VCTM.

[1]  Lionel M. Ni,et al.  Multi-address Encoding for Multicast , 1994, PCRCW.

[2]  Anant Agarwal,et al.  Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[3]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[4]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[5]  Karthikeyan Sankaralingam,et al.  Implementation and Evaluation of a Dynamically Routed Processor Operand Network , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[6]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[7]  Xiaola Lin,et al.  Multicast Communication in Multicomputer Networks , 1993, ICPP.

[8]  William J. Dally,et al.  Research Challenges for On-Chip Interconnection Networks , 2007, IEEE Micro.

[9]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[10]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[11]  Natalie D. Enright Jerger,et al.  Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support , 2008, 2008 International Symposium on Computer Architecture.

[12]  Hector Garcia-Molina,et al.  Message ordering in a multicast environment , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[13]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[14]  Josep Torrellas,et al.  An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[15]  Mainak Chaudhuri,et al.  Exploring virtual network selection algorithms in DSM cache coherence protocols , 2004, IEEE Transactions on Parallel and Distributed Systems.

[16]  José Duato,et al.  Efficient unicast and multicast support for CMPs , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[17]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[18]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.