论文信息 - Communication-efficient implementation of block recursive algorithms on distributed-memory machines

Communication-efficient implementation of block recursive algorithms on distributed-memory machines

This paper presents a design methodology for developing efficient distributed-memory parallel programs for block-recursive algorithms such as the fast Fourier transform and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with circuit-switched or wormhole routed mesh or hypercube interconnection network. A mathematical framework based on the tenser product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tenser algebra. Performance results for FFT programs on the Intel iPSC/860 and Intel Paragon are presented.

[1] F. Graybill,et al. Matrices with Applications in Statistics. , 1984 .

[2] Joe Brewer,et al. Kronecker products and matrix calculus in system theory , 1978 .

[3] D. S. Scott,et al. Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[4] R. W. Johnson,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[5] Rodney W. Johnson,et al. Generating Parallel Programs from Tensor Product Formulas: A Case Study of Strassen's Matrix Multiplication Algorithm , 1992, ICPP.

[6] Sanjit K. Mitra,et al. Kronecker Products, Unitary Matrices and Signal Processing Applications , 1989, SIAM Rev..

[7] F. Graybill,et al. Matrices with Applications in Statistics. , 1984 .

[8] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .

[9] V. Rich. Personal communication , 1989, Nature.

[10] Michael Conner,et al. Recursive fast algorithm and the role of the tensor product , 1992, IEEE Trans. Signal Process..

[11] Sandeep K. S. Gupta,et al. A methodology for generating data distributions to optimize communication , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[12] R. W. Johnson,et al. A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures , 1990 .

[13] Charles R. Johnson,et al. Topics in Matrix Analysis , 1991 .

[14] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[15] P. Sadayappan,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[16] Alexander Graham,et al. Kronecker Products and Matrix Calculus: With Applications , 1981 .

[17] P. Sadayappan,et al. An algebraic theory for modeling direct interconnection networks , 1992, Proceedings Supercomputing '92.

[18] Sanjay Sharma,et al. An Algebraic Theory for Modeling Multistage Interconnection Networks , 1993, J. Inf. Sci. Eng..