A Near-optimal Real-time Hardware Scheduler for Large Cardinality Crossbar Switches

The maximum matching algorithm for bipartite graphs can be used to provide optimal scheduling for crossbar based interconnection networks. Unfortunately, maximum matching requires O(N3) time for an N times N communication system, which has limited its application to real-time network scheduling. In this paper, we show how maximum matching can be reformulated in terms of Boolean operations rather than the more traditional formulations. By taking advantage of the inherent parallelism available in custom hardware design, we introduce three maximum matching implementations in hardware and show how we can trade design complexity for performance. Specifically, we examine a pure logic scheduler with three dimensions of parallelism, a matrix scheduler with two dimensions of parallelism and a vector scheduler with one dimension of parallelism. These designs reduce the algorithmic time complexity down to O(1), O(K), and O(KN), respectively, where K is the number of optimization steps. While an optimal scheduling algorithm requires K=2N-1 steps, our simulation results show that the scheduler can achieve 99% of the optimal schedule when K=9. We examine hardware and time complexity of these architectures for crossbar sizes of up to N=1024. Using FPGA synthesis results, we show that a greedy schedule for various sized crossbars, ranging from 8 times 8 to 256 times 256, can be optimized in less than 20 ns per optimization step. For crossbars reaching 1024 times 1024 the scheduling can be completed in approximately 10 s with current technology and could reach under 90 ns with future technologies

[1]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[2]  Harold N. Gabow,et al.  An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs , 1976, JACM.

[3]  Bruce Hajek,et al.  Scheduling nonuniform traffic in a packet-switching system with small propagation delay , 1997, TNET.

[4]  ZVI GALIL,et al.  Efficient algorithms for finding maximum matching in graphs , 1986, CSUR.

[5]  N. S. Mendelsohn,et al.  Matrices Associated With the Hitchcock Problem , 1962, JACM.

[6]  Mitsuo Yokokawa,et al.  The Earth Simulator system , 2003 .

[7]  F. Glover Maximum matching in a convex bipartite graph , 1967 .

[8]  Andrzej Czygrinow,et al.  Distributed algorithm for approximating the maximum matching , 2004, Discret. Appl. Math..

[9]  Alessandro Panconesi,et al.  On the distributed complexity of computing maximal matchings , 1997, SODA '98.

[10]  Rami G. Melhem,et al.  Switch design to enable predictive multiplexed switching in multiprocessor networks , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[11]  Andrzej Czygrinow,et al.  A Fast Distributed Algorithm for Approximating the Maximum Matching , 2004, ESA.

[12]  Kwan Lawrence Yeung Efficient time slot assignment algorithms for TDM hierarchical and nonhierarchical switching systems , 2001, IEEE Trans. Commun..

[13]  Andrzej Czygrinow,et al.  Distributed Algorithm for Better Approximation of the Maximum Matching , 2003, COCOON.

[14]  GalilZvi Efficient algorithms for finding maximum matching in graphs , 1986 .

[15]  Vijay V. Vazirani,et al.  Matching is as easy as matrix inversion , 1987, STOC.