Internet-Router Buffered Crossbars Based on Networks on Chip

The scalability and performance of the Internet depends critically on the performance of its packet switches. Current packet switches are based on single-hop crossbar fabrics, with line cards that use virtual output-queueing to reduce head-of-line blocking. In this paper we propose to use a multi-hop network on a chip (NOC) as the crossbar fabric, with FIFO-queued line cards. The use of a multi-hop crossbar fabric has several advantages. 1) Speed-up, i.e. the crossbar fabric can operate faster because NOC inter-router wires are shorter than those in a single-hop crossbar, and because arbitration is distributed instead of centralised. 2) Load balancing because paths from different input-output port pairs share the same router buffers, unlike the internal buffers of buffered crossbar fabric that are dedicated to a single input- output pair. 3) Path diversity allows traffic from an input port to follow different paths to its destination output port. This results in further load balancing, especially for non-uniform traffic patterns. 4) Simpler line-card design: the use of FIFOs on the line cards simplifies both the line cards and the (inter- chip) flow control between the crossbar fabric and line cards, reducing the number of (expensive) chip pins required for flow control. 5) Scalability, in the sense that the crossbar speed is independent of the number of ports, which is not the case for single-hop crossbar fabrics. We analyzed the performance of our architecture both analytically and by simulation, and show that it performs well for a wide range of traffic conditions and switch sizes. Additionally we prototyped a 32×32 NOC-based crossbar fabric in a 65nm CMOS technology. The unoptimised implementation operates at 413 MHz, achieving an aggregate throughput in excess of 10 10 ATM cells per second.

[1]  R. Rojas-Cessa,et al.  Combined Input-Crosspoint Buffered Packet Switch with Flexible Access to Crosspoints Buffers , 2006, 2006 International Caribbean Conference on Devices, Circuits and Systems.

[2]  Om Prakash Gangwal,et al.  An efficient on-chip NI offering guaranteed services, shared-memory abstraction, and flexible network configuration , 2005 .

[3]  Erik Jan Marinissen,et al.  Design and DfT of a High-Speed Area-Efficient Embedded Asynchronous FIFO , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Herman Schmit,et al.  Memory optimization in single chip network switch fabrics , 2002, DAC '02.

[5]  Michel Servel,et al.  The 'Prelude' ATD experiment: assessments and future prospects , 1988, IEEE J. Sel. Areas Commun..

[6]  Lotfi Mhamdi A Partially Buffered Crossbar packet switching architecture and its scheduling , 2008, 2008 IEEE Symposium on Computers and Communications.

[7]  Cyriel Minkenberg,et al.  10 A Four-Terabit Packet Switch Supporting Long Round-Trip Times , 2003, IEEE Micro.

[8]  R. Rojas-Cessa,et al.  CIXB-1: combined input-one-cell-crosspoint buffered switch , 2001, 2001 IEEE Workshop on High Performance Switching and Routing (IEEE Cat. No.01TH8552).

[9]  Manolis Katevenis,et al.  Scheduling in switches with small internal buffers , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[10]  Kees G. W. Goossens,et al.  A Design Flow for Application-Specific Networks on Chip with Guaranteed Performance to Accelerate SOC Design and Verification , 2005, Design, Automation and Test in Europe.

[11]  Nick McKeown,et al.  The Tiny Tera: A Packet Switch Core , 1998, IEEE Micro.

[12]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[13]  T. Takeuchi,et al.  Parallel 'ATOM' switch architecture for high-speed ATM networks , 1992, [Conference Record] SUPERCOMM/ICC '92 Discovering a New World of Communications.

[14]  Mounir Hamdi,et al.  MCBF: a high-performance scheduling algorithm for buffered crossbar switches , 2003, IEEE Communications Letters.

[15]  Kees G. W. Goossens,et al.  An efficient on-chip network interface offering guaranteed services, shared-memory abstraction, and flexible network configuration , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[16]  Mounir Hamdi,et al.  CBF: a high-performance scheduling algorithm for buffered crossbar switches , 2003, Workshop on High Performance Switching and Routing, 2003, HPSR..

[17]  Nick McKeown,et al.  Scheduling algorithms for input-queued cell switches , 1996 .