Implementing tree-based multicast routing for write invalidation messages in networks-on-chip

Common distributed shared memory systems using a directory-based protocol operate with unicast messages for write invalidations. The unicast messages serialize the write invalidation transactions, which leads to increased network traffic and latency. This paper proposes an efficient multicast router for a single-flit write invalidation message in on-chip networks. A tree-based routing scheme is followed for multicast routing with a bit-string multidestination encoding. We implemented the tree-based write invalidation router targeting IBM 90nm technology. In network simulation, the proposed design demonstrated 10.5% reduced latency and 3.2% less energy consumption than the unicast and dual-path router.

[1]  Dhabaleswar K. Panda,et al.  Reducing cache invalidation overheads in wormhole routed DSMs using multidestination message passing , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[2]  Xiaola Lin,et al.  Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers , 1994, IEEE Trans. Parallel Distributed Syst..

[3]  Lionel M. Ni,et al.  Multi-address Encoding for Multicast , 1994, PCRCW.

[4]  Natalie D. Enright Jerger,et al.  Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support , 2008, 2008 International Symposium on Computer Architecture.

[5]  Xiaola Lin,et al.  Multicast Communication in Multicomputer Networks , 1993, ICPP.

[6]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[7]  Josep Torrellas,et al.  An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[8]  Chung-Ta King,et al.  An Application-Driven Study of Multicast Communication for Write Invalidation , 2001, The Journal of Supercomputing.

[9]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[10]  Stamatis Vassiliadis,et al.  Parallel Computer Architecture , 2000, Euro-Par.

[11]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .