Scalable High-Radix Modular Crossbar Switches

Crossbars are a basic building block of networks on chip that can be used as fast, single-stage networks or in router cores for larger scale networks. However, scaling crossbars to high radices presents a number of efficiency, performance, and area challenges. Thus, we propose modular flow-through crossbar switch cores that perform better at high radices than conventional monolithic designs. The modular sub-blocks are arranged in a controlled flow-through, pipelined scheme to eliminate global connections and maintain linear performance scaling and high throughput. Modularity also enables energy savings via deactivation of unused I/O wires. Evaluation using an analytical crossbar switch modeling tool demonstrated improved energy delay product (up to 5.3X) compared to conventional crossbar switches, but with approximately 30% area overhead. Further, we evaluated modular crossbar networks with the proposed switch cores using BookSim2, cycle-accurate detailed network on chip tool. The proposed design achieves more than 90% saturation capacity with an internal speed up of 1.5, supports data line rates as high as 102.4Gbps (in 40nm CMOS bulk), and offers lower average network latency compared to conventional crossbars.

[1]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[2]  David Blaauw,et al.  A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.

[3]  Thomas E. Anderson,et al.  High speed switch scheduling for local area networks , 1992, ASPLOS V.

[4]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[5]  Cyriel Minkenberg,et al.  SCOC: High-radix switches made of bufferless clos networks , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[6]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]  Yuval Tamir,et al.  High-performance multiqueue buffers for VLSI communication switches , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[8]  Dionisios N. Pnevmatikatos,et al.  Crossbar NoCs Are Scalable Beyond 100 Nodes , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Samuel P. Morgan,et al.  Input Versus Output Queueing on a Space-Division Packet Switch , 1987, IEEE Trans. Commun..

[10]  Jung Ho Ahn,et al.  Network within a network approach to create a scalable high-radix router microarchitecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[11]  Y. Tamir,et al.  High-performance multi-queue buffers for VLSI communications switches , 1988, ISCA '88.

[12]  Pedro López,et al.  Towards an efficient switch architecture for high-radix switches , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[13]  Ron Ho,et al.  Modeling and Design of High-Radix On-Chip Crossbar Switches , 2015, NOCS.

[14]  Dionisios N. Pnevmatikatos,et al.  VLSI micro-architectures for high-radix crossbar schedulers , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[15]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.