论文信息 - Efficient communication primitives on hypercubes

Efficient communication primitives on hypercubes

We give practical algorithms, complexity analysis and implementation for one-to-all broadcasting, all-to-all personalized communication and matrix transpose (with two-dimensional partitioning of the matrix) on hypercubes. We assume the following communication characteristics: circuit-switched, e-cube routing and one-port communication model. For one-to-all broadcasting, we give an algorithm that combines the well-known recursive doubling algorithm[1] and the algorithm based on edgedisjoint spanning trees[2]. The measured times of the combined algorithm are always superior to those of the edge-disjoint spanning tree algorithm and outperform the recursive doubling algorithm. For all-to-all personalized communication we propose a hybrid algorithm that combines the well-known recursive doubling algorithm[3,4] and the recently proposed direct-route algorithm[5,6] Our hybrid algorithm balances between data transfer time and start-up time of these two algorithms, and its communication complexity is estimated to be better than the two previous algorithms for a range of machine parameters. For matrix transpose with two-dimensional partitioning of the matrix, we relate a two-phase algorithm to the previous result in Reference 7. The algorithm is predicted to be better than the recursive transpose algorithm[8] by n nearest-neighbor communications[4]. It takes advantage of circuit-switched routing and is congestion-free within each phase. We also suggest a way of storing the matrix such that the transpose operation can be realized in one phase without congestion.

C. T. Howard Ho | M. T. Raghunath | C. T. Ho | M. Raghunath

[1] J. O. Eklundh,et al. A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.

[2] Alfred V. Aho,et al. The Design and Analysis of Computer Algorithms , 1974 .

[3] A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[4] H. T. Kung,et al. Sorting on a mesh-connected parallel computer , 1977, CACM.

[5] Sartaj Sahni,et al. An optimal routing algorithm for mesh-connected Parallel computers , 1980, JACM.

[6] Pen-Chung Yew,et al. An Easily Controlled Network for Frequently Used Permutations , 1981, IEEE Transactions on Computers.

[7] Franco P. Preparata,et al. The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[8] Dennis Gannon,et al. On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms , 1984, IEEE Transactions on Computers.

[9] Jeffrey D Ullma. Computational Aspects of VLSI , 1984 .

[10] M. Heath,et al. Matrix factorization on a hypercube multiprocessor , 1985 .

[11] S. Lennart Johnsson,et al. Algorithms for Matrix Transposition on Boolean n-Cube Configured Ensemble Architectures , 1988, ICPP.

[12] Peter R. Cappello,et al. Gaussian Elimination on a Hypercube Automaton , 1987, J. Parallel Distributed Comput..

[13] R. Arlauskas,et al. iPSC/2 system: a second generation hypercube , 1988, C3P.

[14] M. H. Schultz,et al. Topological properties of hypercubes , 1988, IEEE Trans. Computers.

[15] Ching-Tien Ho,et al. Computing Fast Fourier Transforms On Boolean Cubes And Related Networks , 1988, Optics & Photonics.

[16] G. C. Fox,et al. Optimal communication algorithms for regular decompositions on the hypercube , 1988, C3P.

[17] S. F. Nugent,et al. The iPSC/2 direct-connect communications technology , 1988, C3P.

[18] P. Close. The iPSC/2 node architecture , 1988, C3P.

[19] Lennart Johnsson. Matrix Multiplication on Boolean Cubes using Generic Communication Primitives , 1989 .

[20] David Nassimi. A Fault-Tolerant Routing Algorithm for BPC Permutations on Multistage Interconnection Networks , 1989, ICPP.

[21] Dirk Roose,et al. Benchmarking the iPSC/2 Hypercube Multiprocessor , 1989, Concurr. Pract. Exp..

[22] S. Lennart Johnsson,et al. Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[23] Ching-Tien Ho,et al. Optimal communication primitives and graph embeddings on hypercubes , 1990 .

[24] Abdulla Bataineh,et al. Load balanced sort on hypercube multiprocessors , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[25] S. R. Seidel,et al. Refining the Communication Model for the Intel iPSC/2 , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[26] Quentin F. Stout,et al. Intensive Hypercube Communication. Prearranged Communication in Link-Bound Machines , 1990, J. Parallel Distributed Comput..

[27] Geoffrey C. Fox,et al. An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[28] John N. Tsitsiklis,et al. Optimal Communication Algorithms for Hypercubes , 1991, J. Parallel Distributed Comput..

[29] Alan Edelman,et al. Optimal Matrix Transposition and Bit Reversal on Hypercubes: All-to-All Personalized Communication , 1991, J. Parallel Distributed Comput..

[30] Ching-Tien Ho,et al. Maximizing Channel Utilization for All–to–All Personalized Communication on Boolean cubes , 1991 .

[31] Pierre Fraigniaud,et al. Complexity Analysis of Broadcasting in Hypercubes with Restricted Communication Capabilities , 1992, J. Parallel Distributed Comput..