On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networks

When a number of applications simultaneously running on a many-core chip multiprocessor (CMP) chip connected through network-on-chip (NoC), significant amount of on-chip traffic is one-to-many (multicast) in nature. As a matter of fact, when multiple applications are mapped onto an NoC architecture with applicable traffic isolation constraints, the corresponding sub-networks of these applications are mapped onto actually tend to be irregular. In the literature, multicasting for irregular topologies is supported through either multiple unicasting or broadcasting, which, unfortunately, results in overly high power consumption and/or long network latency. To address this problem, a simple, yet efficient hardware-based multicasting scheme is proposed in this paper. First, an irregular oriented multicast strategy is proposed. Literally, following this strategy, an irregular oriented multicast routing algorithm can be designed based on any regular mesh based multicast routing algorithm. One such algorithm, namely, Alternative Recursive Partitioning Multicasting (AL+RPM), is proposed based on RPM, which was designed for regular mesh topology originally. The basic idea of AL+RPM is to find the output directions following the basic RPM algorithm and then decide to replicate the packets to the original output directions or the alternative (AL) output directions based on the shape of the sub-network. The experiment results show that the proposed multicast AL+RPM algorithm can consume, on average, 14% and 20% less power than bLBDR (a broadcasting-based routing algorithm) and the multiple unicast scheme, respectively. In addition, AL+RPM has much lower network latency than the above two approaches. To incorporate AL+RPM into a baseline router to support multicasting, the area overhead is fairly modest, less than 5.5%.

[1]  Coniferous softwood GENERAL TERMS , 2003 .

[2]  Valentin Puente,et al.  MRR: Enabling fully adaptive multicast routing for CMP interconnection networks , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[3]  Natalie D. Enright Jerger,et al.  Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support , 2008, 2008 International Symposium on Computer Architecture.

[4]  Axel Jantsch,et al.  Connection-oriented multicasting in wormhole-switched networks on chip , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[5]  Kees G. W. Goossens,et al.  Efficient Multicast Support in Buffered Crossbars using Networks on Chip , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[6]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[7]  Radu Marculescu,et al.  Energy- and Performance-Aware Incremental Mapping for Networks on Chip With Multiple Voltage Levels , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[9]  Radu Marculescu,et al.  User-Aware Dynamic Task Allocation in Networks-on-Chip , 2008, 2008 Design, Automation and Test in Europe.

[10]  Naga K. Govindaraju,et al.  Challenges and Opportunities in Many-Core Computing , 2008, Proceedings of the IEEE.

[11]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[12]  Hyungjun Kim,et al.  Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[13]  ScienceDirect Microprocessors and microsystems , 1978 .

[14]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[15]  Srinivasan Murali,et al.  Designing Reliable and Efficient Networks on Chips , 2009, Lecture Notes in Electrical Engineering.

[16]  José Duato,et al.  Efficient unicast and multicast support for CMPs , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[17]  Radu Marculescu,et al.  Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Prasant Mohapatra,et al.  Asynchronous Tree-Based Multicasting in Wormhole-Switched MINs , 1999, IEEE Trans. Parallel Distributed Syst..

[19]  Vishakha Gupta,et al.  High-Performance Hypervisor Architectures: Virtualization in HPC Systems , 2007 .

[20]  Vincenzo Catania,et al.  Application Specific Routing Algorithms for Networks on Chip , 2009, IEEE Transactions on Parallel and Distributed Systems.

[21]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[22]  Manfred Glesner,et al.  Planar Adaptive Router Microarchitecture for Tree-Based Multicast Network-on-Chip , 2008 .