On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.

[1]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[2]  Harold S. Stone,et al.  Parallel Tridiagonal Equation Solvers , 1975, TOMS.

[3]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[4]  Jean Vuillemin,et al.  A combinatorial limit to the computing power of V.L.S.I. circuits , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[5]  J. Savage Planar Circuit Complexity and The Performance of VLSI Algorithms , 1981 .

[6]  Robert E. Fulton,et al.  Substructuring techniques—status and projections , 1978 .

[7]  Bernard Chazelle,et al.  Census functions: An approach to VLSI upper bounds , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[8]  Robert J. McMillen,et al.  Using the Augmented Data Manipulator Network in PASM , 1981, Computer.

[9]  J. C. Browne,et al.  Techniques for Solving Block Tridiagonal Systems on Reconfigurable Array Computers , 1984 .

[10]  Robert G. Voigt,et al.  The Solution of Tridiagonal Linear Systems on the CDC STAR 100 Computer , 1975, TOMS.

[11]  Allan Gottlieb,et al.  Networks and algorithms for very-large-scale parallel computation , 2011, Computer.

[12]  David J. Kuck,et al.  On Stable Parallel Linear System Solvers , 1978, JACM.

[13]  H. T. Kung,et al.  Sorting on a mesh-connected parallel computer , 1977, CACM.

[14]  Jean Vuillemin,et al.  A Combinatorial Limit to the Computing Power of VLSI Circuits , 1983, IEEE Transactions on Computers.

[15]  D. Heller A Survey of Parallel Algorithms in Numerical Linear Algebra. , 1978 .

[16]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[17]  Richard P. Brent,et al.  Some Area-Time Tradeoffs for VLSI , 1982, SIAM J. Comput..

[18]  Daniel Gajski,et al.  CEDAR: a large scale multiprocessor , 1983, CARN.

[19]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[20]  John P. Fishburn,et al.  Quotient Networks , 1982, IEEE Transactions on Computers.

[21]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[22]  H. F. Jordan A Special Purpose Architecture for Finite Element Analysis , 1978 .

[23]  Lawrence Snyder,et al.  Introduction to the configurable, highly parallel computer , 1982, Computer.

[24]  F. Leighton New lower bound techniques for VLSI , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[25]  Doug DeGroot,et al.  Expanding and Contracting SW-Banyan Networks , 1983, ICPP.

[26]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[27]  Douglas Stott Parker,et al.  Notes on Shuffle/Exchange-Type Switching Networks , 1980, IEEE Transactions on Computers.

[28]  Billy L. Buzbee A Fast Poisson Solver Amenable to Parallel Computation , 1973, IEEE Transactions on Computers.

[29]  W. M. Gentleman,et al.  Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).