The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes

Parallel FFT data-flow graphs based on a Butterfly graph followed by a bit-reversal permutation are known, as are optimal-order embeddings of these flow-graphs onto meshes and hypercubes. Embeddings onto a 2D mesh require O(sqrtN) data transfer steps and O(logN) computation steps. Embeddings onto a hypercube require O(logN) data transfer steps and O(logN) computation steps. A similar FFT algorithm for the recently proposed ”hypermesh”, with O(logN) computation steps and O(logN) data transfer steps, is proposed. The performance complexity of the FFT algorithm on all three interconnection networks is then compared, based on the assumptions that (1) all networks are built with discrete crossbar switches interconnected with transmission lines, (2) all networks compared have equivalent aggregate bandwidth, and (3) the packet transmission time is inversely proportional to the link bandwidth. The algorithms are viewed at the ”Word-level” of abstraction, where every packet is treated as an indivisible unit. Under these assumptions, it is concluded that for practical network sizes the 2D hypermesh is faster than the 2D mesh and the binary hypercube by factors of O( √ N/logN) and O(logN) respectively. Considering the computation of a 4K sample FFT on 4K processor networks, the hypermesh is roughly a factor of 27 times faster than a 2D mesh and a factor of 10 time faster than a binary hypercube. Variations in the assumptions may affect the end results slightly; these conclusions may not hold when the network is implemented entirely on a single wafer, but this scenario is unlikely for the next decade or two. These complexity results indicate that the hypermesh is the preferred interconnection scheme in discrete component constructions of parallel supercomputers.

[1]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[2]  Noboru Tanabe,et al.  Base-m n-cube: High Performance Interconnection Networks for Highly Parallel Computer PRODIGY , 1991, ICPP.

[3]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[4]  Marshall C. Pease,et al.  The Indirect Binary n-Cube Microprocessor Array , 1977, IEEE Transactions on Computers.

[5]  Howard Jay Siegel,et al.  A Model of SIMD Machines and a Comparison of Various Interconnection Networks , 1979, IEEE Transactions on Computers.

[6]  Ted H. Szymanski,et al.  An analysis of deflection routing in multi-dimensional regular mesh networks , 1991, IEEE INFCOM '91. The conference on Computer Communications. Tenth Annual Joint Comference of the IEEE Computer and Communications Societies Proceedings.

[7]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[8]  Isaac D. Scherson,et al.  Orthogonal Graphs for the Construction of a Class of Interconnection Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[9]  Ted Sztnabski A fiber optic hypermesh for SIMD/MIMD machines , 1990, Supercomputing '90.

[10]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[11]  Howard Jay Siegel,et al.  Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.) , 1985 .