Design and performance of speculative flow control for high-radix datacenter interconnect switches

High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [J. Kim, W.J. Dally, B. Towles, A.K. Gupta, Microarchitecture of a high-radix router, in: Proc. ISCA 2005, Madison, WI, 2005]. Unfortunately, most existing switch architectures do not scale well to a large number of ports, for example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable switches that are based on the increasingly popular buffered crossbar architecture, to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time. The proposed scheme substantially reduces message latency and improves throughput of partially buffered crossbar switches loaded with synthetic uniform and non-uniform bursty traffic. Moreover, simulations replaying traces of several typical MPI applications demonstrate communication speedup factors of 2 to 10 times.

[1]  Cyriel Minkenberg,et al.  Stability degree of switches with finite buffers and non-negligible round-trip time , 2003, Microprocess. Microsystems.

[2]  Cyriel Minkenberg,et al.  10 A Four-Terabit Packet Switch Supporting Long Round-Trip Times , 2003, IEEE Micro.

[3]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[4]  Krzysztof Pawlikowski,et al.  On credibility of simulation studies of telecommunication networks , 2002, IEEE Commun. Mag..

[5]  Kenji Yoshigoe Rate-based flow-control for the CICQ switch , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[6]  Masayoshi Nabeshima Input-Queued Switches Using Two Schedulers in Parallel , 2002 .

[7]  Cyriel Minkenberg,et al.  Reducing memory size in buffered crossbars with large internal flow control latency , 2003, GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).

[8]  Kenneth J. Christensen,et al.  The RR/RR CICQ switch: hardware design for 10-Gbps link speed , 2003, Conference Proceedings of the 2003 IEEE International Performance, Computing, and Communications Conference, 2003..

[9]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[10]  William J. Dally,et al.  Microarchitecture of a High-Radix Router , 2005, ISCA 2005.

[11]  George Varghese,et al.  Reliable and Efficient Hop-by-Hop Flow Control , 1995, IEEE J. Sel. Areas Commun..

[12]  Kenji Yoshigoe The CICQ Switch with Virtual Crosspoint Queues for Large RTT , 2006, 2006 IEEE International Conference on Communications.

[13]  Trevor Blackwell,et al.  Credit-based flow control for ATM networks: credit update protocol, adaptive credit allocation and statistical multiplexing , 1994, SIGCOMM 1994.

[14]  Cyriel Minkenberg,et al.  Control path implementation for a low-latency optical HPC switch , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[15]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Jian Li,et al.  A framework for end-to-end simulation of high-performance computing systems , 2008, Simutools 2008.

[17]  Roberto Rojas-Cessa,et al.  Load-balanced combined input-crosspoint buffered packet switch and long round-trip times , 2005, IEEE Communications Letters.

[18]  Manolis Katevenis,et al.  Scheduling in switches with small internal buffers , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[19]  R. Rojas-Cessa,et al.  CIXB-1: combined input-one-cell-crosspoint buffered switch , 2001, 2001 IEEE Workshop on High Performance Switching and Routing (IEEE Cat. No.01TH8552).

[20]  Ioannis Papaefstathiou,et al.  Variable packet size buffered crossbar (CICQ) switches , 2004, 2004 IEEE International Conference on Communications (IEEE Cat. No.04CH37577).

[21]  Mitchell Gusat,et al.  Flow control scheduling , 2003, Microprocess. Microsystems.

[22]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[23]  A. Varga,et al.  THE OMNET++ DISCRETE EVENT SIMULATION SYSTEM , 2003 .