Design of High-Radix Clos Network-on-Chip

Many high-radix Network-on-Chip (NOC) topologies have been proposed to improve network performance with an ever-growing number of processing elements (PEs) on a chip. We believe Clos Network-on-Chip (CNOC) is the most promising with its low average hop counts and good load-balancing characteristics. In this paper, we propose (1) a high-radix router architecture with Virtual Output Queue (VOQ) buffer structure and Packet Mode Dual Round-Robin Matching (PDRRM) scheduling algorithm to achieve high speed and high throughput in CNOC, (2) a heuristic floor-planning algorithm to minimize the power consumption caused by the long wires. Experimental results show that the throughput of a 64-node 3-stage CNOC under uniform traffic increases from 62% to 78% by replacing the baseline routers with PDRRM VOQ routers. We also compared CNOC with other NOC topologies, and found that using the new design techniques, CNOC has the highest throughput, lowest zero-load latency, and best power efficiency.

[1]  José Duato,et al.  RUFT: Simplifying the Fat-Tree Topology , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[2]  William J. Dally,et al.  Microarchitecture of a High-Radix Router , 2005, ISCA 2005.

[3]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[4]  D. Jayasimha,et al.  On-Chip Interconnection Networks : Why They are Different and How to Compare Them , 2007 .

[5]  Bill Lin,et al.  Design of a High-Throughput Distributed Shared-Buffer NoC Router , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[6]  Hung-Hsiang Jonathan Chao,et al.  Centralized contention resolution schemes for a large-capacity optical ATM switch , 1998, 1998 IEEE ATM Workshop Proceedings. 'Meeting the Challenges of Deploying the Global Broadband Network Infrastructure' (Cat. No.98EX164).

[7]  Pedro López,et al.  Assessing fat-tree topologies for regular network-on-chip design under nanoscale technology constraints , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[8]  A. Kumary,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .

[9]  Nick McKeown,et al.  The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[10]  Axel Jantsch,et al.  Layout, Performance and Power Trade-Offs in Mesh-Based Network-on-Chip Architectures , 2003, VLSI-SOC.

[11]  Alain Greiner,et al.  SPIN: a scalable, packet switched, on-chip micro-network , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[12]  Mohan Kumar,et al.  On generalized fat trees , 1995, Proceedings of 9th International Parallel Processing Symposium.

[13]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[14]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[15]  Giovanni De Micheli,et al.  Physical planning for on-chip multiprocessor networks and switch fabrics , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[16]  H. Jonathan Chao,et al.  High Performance Switches and Routers , 2007 .

[17]  Pedro López,et al.  A family of mechanisms for congestion control in wormhole networks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[18]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, MICRO.

[19]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  Niraj K. Jha,et al.  A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.

[21]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.