TPSS: A Flexible Hardware Support for Unicast and Multicast on Network-on-Chip

Multicast is an important traffic mode that runs on multi-core systems, and an efficient hardware support for multicast can greatly improve the performance of the whole system. Most multicast solutions use the dimension-order routing to generate the mutlicast trees, which are neither bandwidth nor power efficient. This article presents a synthesizable router for network-on-chip (NoC) which supports arbitrarily shaped multicast path based on a mesh topology. In our scheme, incremental setup is adopted to simplify the process of multicast tree construction. For each sub-path setup, we present a novel scheme called two period sub-path setup (TPSS). TPSS is divided into two periods: routing to a predeterminate intermediate router, and updating lookup tables from the intermediate router to destination. This novel setup makes it feasible to support arbitrarily shaped path setup. In our case study, Optimized tree algorithm (OPT) and Left-XY-Right-Optimized tree algorithm (LXYROPT) are proposed for power-efficient path searching, but they need to be pre-configured for the reason of high computation cost. Moreover, Virtual Circuit Tree Multicasting (VCTM) is also supported in our scheme for dynamic construction of multicast path, which needs no computation in path searching. The performance is evaluated by using a cycle accurate simulator developed in SystemC, and the hardware overhead is estimated by using a synthesizable HDL model. Compared to VCTM (without FIFO, multicast table and network adapter), the area overhead of implementing our router is negligible (less than 0.5%).

[1]  Josep Torrellas,et al.  An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[2]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[3]  Manfred Glesner,et al.  Multicast Parallel Pipeline Router Architecture for Network-on-Chip , 2008, 2008 Design, Automation and Test in Europe.

[4]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[5]  Manuel P. Malumbres,et al.  An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors , 2000, J. Syst. Archit..

[6]  Masoud Daneshtalab,et al.  Low-distance path-based multicast routing algorithm for network-on-chips , 2009, IET Comput. Digit. Tech..

[7]  Lionel M. Ni,et al.  Multi-address Encoding for Multicast , 1994, PCRCW.

[8]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[9]  Kees G. W. Goossens,et al.  Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip , 2003, DATE.

[10]  Axel Jantsch,et al.  Connection-oriented multicasting in wormhole-switched networks on chip , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[11]  Alain Greiner,et al.  A generic architecture for on-chip packet-switched interconnections , 2000, DATE '00.

[12]  José Duato,et al.  Efficient unicast and multicast support for CMPs , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[13]  Natalie D. Enright Jerger,et al.  Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support , 2008, 2008 International Symposium on Computer Architecture.

[14]  Jonathan S. Turner An optimal nonblocking multicast virtual circuit switch , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[15]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[16]  Xiaola Lin,et al.  Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers , 1994, IEEE Trans. Parallel Distributed Syst..

[17]  Wim Vanderbauwhede,et al.  Communication modeling of multicast in all-port wormhole-routed NoCs , 2010, J. Syst. Softw..

[18]  Dhabaleswar K. Panda,et al.  Efficient broadcast and multicast on multistage interconnection networks using multiport encoding , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[19]  Cauligi S. Raghavendra,et al.  Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms , 1998, IEEE Trans. Parallel Distributed Syst..

[20]  Xiaola Lin,et al.  Multicast Communication in Multicomputer Networks , 1993, ICPP.

[21]  Axel Jantsch,et al.  Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.