Efficient communication primitives on hypercubes

We give practical algorithms, complexity analysis and implementation for one-to-all broadcasting, all-to-all personalized communication and matrix transpose (with two-dimensional partitioning of the matrix) on hypercubes. We assume the following communication characteristics: circuit-switched, e-cube routing and one-port communication model. For one-to-all broadcasting, we give an algorithm that combines the well-known recursive doubling algorithm[1] and the algorithm based on edgedisjoint spanning trees[2]. The measured times of the combined algorithm are always superior to those of the edge-disjoint spanning tree algorithm and outperform the recursive doubling algorithm. For all-to-all personalized communication we propose a hybrid algorithm that combines the well-known recursive doubling algorithm[3,4] and the recently proposed direct-route algorithm[5,6] Our hybrid algorithm balances between data transfer time and start-up time of these two algorithms, and its communication complexity is estimated to be better than the two previous algorithms for a range of machine parameters. For matrix transpose with two-dimensional partitioning of the matrix, we relate a two-phase algorithm to the previous result in Reference 7. The algorithm is predicted to be better than the recursive transpose algorithm[8] by n nearest-neighbor communications[4]. It takes advantage of circuit-switched routing and is congestion-free within each phase. We also suggest a way of storing the matrix such that the transpose operation can be realized in one phase without congestion.

[1]  J. O. Eklundh,et al.  A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.

[2]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[3]  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[4]  H. T. Kung,et al.  Sorting on a mesh-connected parallel computer , 1977, CACM.

[5]  Sartaj Sahni,et al.  An optimal routing algorithm for mesh-connected Parallel computers , 1980, JACM.

[6]  Pen-Chung Yew,et al.  An Easily Controlled Network for Frequently Used Permutations , 1981, IEEE Transactions on Computers.

[7]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[8]  Dennis Gannon,et al.  On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms , 1984, IEEE Transactions on Computers.

[9]  Jeffrey D Ullma Computational Aspects of VLSI , 1984 .

[10]  M. Heath,et al.  Matrix factorization on a hypercube multiprocessor , 1985 .

[11]  S. Lennart Johnsson,et al.  Algorithms for Matrix Transposition on Boolean n-Cube Configured Ensemble Architectures , 1988, ICPP.

[12]  Peter R. Cappello,et al.  Gaussian Elimination on a Hypercube Automaton , 1987, J. Parallel Distributed Comput..

[13]  R. Arlauskas,et al.  iPSC/2 system: a second generation hypercube , 1988, C3P.

[14]  M. H. Schultz,et al.  Topological properties of hypercubes , 1988, IEEE Trans. Computers.

[15]  Ching-Tien Ho,et al.  Computing Fast Fourier Transforms On Boolean Cubes And Related Networks , 1988, Optics & Photonics.

[16]  G. C. Fox,et al.  Optimal communication algorithms for regular decompositions on the hypercube , 1988, C3P.

[17]  S. F. Nugent,et al.  The iPSC/2 direct-connect communications technology , 1988, C3P.

[18]  P. Close The iPSC/2 node architecture , 1988, C3P.

[19]  Lennart Johnsson Matrix Multiplication on Boolean Cubes using Generic Communication Primitives , 1989 .

[20]  David Nassimi A Fault-Tolerant Routing Algorithm for BPC Permutations on Multistage Interconnection Networks , 1989, ICPP.

[21]  Dirk Roose,et al.  Benchmarking the iPSC/2 Hypercube Multiprocessor , 1989, Concurr. Pract. Exp..

[22]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[23]  Ching-Tien Ho,et al.  Optimal communication primitives and graph embeddings on hypercubes , 1990 .

[24]  Abdulla Bataineh,et al.  Load balanced sort on hypercube multiprocessors , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[25]  S. R. Seidel,et al.  Refining the Communication Model for the Intel iPSC/2 , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[26]  Quentin F. Stout,et al.  Intensive Hypercube Communication. Prearranged Communication in Link-Bound Machines , 1990, J. Parallel Distributed Comput..

[27]  Geoffrey C. Fox,et al.  An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[28]  John N. Tsitsiklis,et al.  Optimal Communication Algorithms for Hypercubes , 1991, J. Parallel Distributed Comput..

[29]  Alan Edelman,et al.  Optimal Matrix Transposition and Bit Reversal on Hypercubes: All-to-All Personalized Communication , 1991, J. Parallel Distributed Comput..

[30]  Ching-Tien Ho,et al.  Maximizing Channel Utilization for All–to–All Personalized Communication on Boolean cubes , 1991 .

[31]  Pierre Fraigniaud,et al.  Complexity Analysis of Broadcasting in Hypercubes with Restricted Communication Capabilities , 1992, J. Parallel Distributed Comput..