Communication primitives for unstructured finite element simulations on data parallel architectures

Efficient data motion is critical for high performance computing on distributed memory architectures. The value of some techniques for efficient data motion is illustrated by identifying generic communication primitives. Further, the efficiency of these primitives is demonstrated on three different applications using the finite element method for unstructured grids and sparse solvers with different communication requirements. For the applications presented, the techniques advocated reduced the communication times by a factor of between 1.5 and 3.

[1]  R. L. Sani,et al.  Consistent vs. reduced integration penalty methods for incompressible media using several old and new elements , 1982 .

[2]  A. Needleman,et al.  Effect of material rate sensitivity on failure modes in the Charpy V-notch test , 1986 .

[3]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[4]  Ted Belytschko,et al.  Finite element analysis on the connection machine , 1990 .

[5]  O. C. Zienkiewicz,et al.  ITERATIVE METHOD FOR CONSTRAINED AND MIXED APPROXIMATION. AN INEXPENSIVE IMPROVEMENT OF F.E.M. PERFORMANCE , 1985 .

[6]  S. Lennart Johnsson,et al.  Data structures and algorithms for the finite element method on a data parallel supercomputer , 1990 .

[7]  Gene H. Golub,et al.  Matrix computations , 1983 .

[8]  Michael Metcalf,et al.  Fortran 90 Explained , 1990 .

[9]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[10]  Paul R. Dawson,et al.  On modeling the development of crystallographic texture in bulk forming processes , 1989 .

[11]  Thomas J. R. Hughes,et al.  A data parallel finite element method for computational fluid dynamics on the Connection Machine system , 1992 .

[12]  O. Zienkiewicz,et al.  A note on mass lumping and related processes in the finite element method , 1976 .

[13]  Marina C. Chen Optimizing FORTRAN-90 Programs for Data Motion on Massively Parallel Systems , 1992 .

[14]  Charbel Farhat,et al.  Transient finite element computations on 65536 processors: The connection machine , 1990 .

[15]  Stéphane Lanteri,et al.  Two-dimensional viscous flow computations on the Connection Machine: unstructured meshes, upwind schemes and massively parallel computations , 1993 .

[16]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[17]  Viggo Tvergaard,et al.  An analysis of the temperature and rate dependence of Charpy V-notch energies for a high nitrogen steel , 1988 .

[18]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[19]  R. Shapiro,et al.  Implementation of an Euler/Navier-Stokes finite element algorithm onthe Connection Machine , 1991 .

[20]  Thomas J. R. Hughes,et al.  A globally convergent matrix-free algorithm for implicit time-marching schemes arising in finite element analysis in fluids , 1991 .

[21]  Jill P. Mesirov,et al.  An optimal hypercube direct N-body solver on the Connection Machine , 1990, Proceedings SUPERCOMPUTING '90.

[22]  S. Lennart Johnsson,et al.  All-To-All Broadcast and Applications On the Connection Machine , 1992, Int. J. High Perform. Comput. Appl..

[23]  G. C. Johnson,et al.  Three-dimensional deformation process simulation with explicit use of polycrystal plasticity models , 1993 .

[24]  Sandeep N. Bhatt,et al.  The fluent abstract machine , 1988 .

[25]  Alan Needleman,et al.  Dynamic 3D analysis of the Charpy V-notch test , 1993 .

[26]  S. Johnsson,et al.  Experience with the conjugate gradient method for stress analysis on a data parallel supercomputer , 1989 .

[27]  Abhiram G. Ranade,et al.  How to emulate shared memory (Preliminary Version) , 1987, FOCS.