Comparison of message-passing and shared memory implementations of the GMRES method on MIMD computers

In this paper we compare different parallel implementations of the same algorithm for solving nonlinear simulation problems on unstructured meshes. In the first implementation, making use of the message-passing programming model and the PVM system, the domain decomposition of unstructured mesh is implemented, while the second implementation takes advantage of the inherent parallelism of the algorithm by adopting the shared-memory programming model. Both implementations are applied to the preconditioned GMRES method that solves iteratively the system of linear equations. A combined approach, the hybrid programming model suitable for multicomputers with SMP nodes, is introduced. For performance measurements we use compressible fluid flow simulation in which sequences of finite element solutions form time approximations to the Euler equations. The tests are performed on HP SPP1600, HP S2000 and SGI Origin2000 multiprocessors and report wall-clock execution time and speedup for different number of processing nodes and for different meshes. Experimentally, the explicit programming model proves to be more efficient than the implicit model by 20--70%, depends on the mesh and the machine.

[1]  Tayfun E. Tezduyar,et al.  SUPG finite element computation of viscous compressible flows based on the conservation and entropy variables formulations , 1993 .

[2]  Masha Sosonkina,et al.  Scalable parallel implementations of the GMRES algorithm via Householder reflections , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[3]  Boleslaw K. Szymanski,et al.  Predictive Load Balancing for Parallel Adaptive Finite Element Computation , 1997, PDPTA.

[4]  Andrew S. Tanenbaum,et al.  Distributed operating systems , 2009, CSUR.

[5]  Vipin Kumar,et al.  Graph partitioning for high-performance scientific simulations , 2003 .

[6]  Leszek Demkowicz,et al.  New quasi-natural artificial viscosity models for compressible fluid flow, with an improved entropy production mechanism , 1997 .

[7]  Jens Zimmermann,et al.  Parallelizing an Unstructured Grid Generator with a Space-Filling Curve Approach , 2000, Euro-Par.

[8]  Jacek Kitowski,et al.  Finite Element Message-Passing/DSM Simulation Algorithm for Parallel Computers , 1998, HPCN Europe.

[9]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[10]  Horst D. Simon,et al.  Partitioning of unstructured problems for parallel processing , 1991 .

[11]  Jeffrey K. Hollingsworth,et al.  Exploiting Fine-Grained Idle Periods in Networks of Workstations , 2000, IEEE Trans. Parallel Distributed Syst..

[12]  Tamara G. Kolda,et al.  Graph partitioning models for parallel computing , 2000, Parallel Comput..

[13]  J. Tinsley Oden,et al.  Problem decomposition for adaptive hp finite element methods , 1995 .

[14]  Leszek Demkowicz,et al.  Toward a universal h-p adaptive finite element strategy , 1989 .

[15]  John F. Abel,et al.  Recursive spectral algorithms for automatic domain partitioning in parallel finite element analysis , 1995 .

[16]  Peter Hansbo,et al.  Explicit streamline diffusion finite element methods for the compressible Euler equations in conservation variables , 1993 .

[17]  K. Banas,et al.  Convergence to steady-state solutions for stabilized finite element simulations of compressible flows☆ , 2000 .

[18]  Kenneth Eriksson,et al.  Adaptive streamline diffusion finite element methods for stationary convection-diffusion problems , 1993 .

[19]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[20]  Peter Hansbo,et al.  Adaptive streamline diffusion methods for compressible flow using conservation variables , 1991 .

[21]  T. Hughes,et al.  A new finite element formulation for computational fluid dynamics. X - The compressible Euler and Navier-Stokes equations , 1991 .

[22]  Peter Luksch Parallel and distributed implementation of large industrial applications , 2000, Future Gener. Comput. Syst..

[23]  Roy D. Williams,et al.  Performance of dynamic load balancing algorithms for unstructured mesh calculations , 1991, Concurr. Pract. Exp..

[24]  Stéphane Lanteri,et al.  TOP/DOMDEC : a software tool for mesh partitioning and parallel processing and applications to CSM a , 1995 .

[25]  P. Tallec Domain decomposition methods in computational mechanics , 1994 .

[26]  E. Sturler,et al.  Communication Cost Reduction for Krylov Methods on Parallel Computers , 1994, HPCN.

[27]  Gene H. Golub,et al.  Scientific computing , 1993 .

[28]  Ian T. Foster,et al.  Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .

[29]  C. Farhat,et al.  The second generation FETI methods and their application to the parallel solution of large-scale linear and geometrically non-linear structural analysis problems , 2000 .

[30]  Masha Sosonkina,et al.  Non-standard Parallel Solution Strategies for Distributed Sparse Linear Systems , 1999, ACPC.

[31]  P. Woodward,et al.  The numerical simulation of two-dimensional fluid flow with strong shocks , 1984 .

[32]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[33]  Gregory W. Brown,et al.  Mesh partitioning for implicit computations via iterative domain decomposition: Impact and optimization of the subdomain aspect ratio , 1995 .

[34]  T. Hughes,et al.  A new finite element formulation for computational fluid dynamics: II. Beyond SUPG , 1986 .

[35]  J. Donea A Taylor–Galerkin method for convective transport problems , 1983 .

[36]  Waldemar Rachowicz,et al.  An overlapping domain decomposition preconditioner for an anisotropic h-adaptive finite element method , 1995 .

[37]  Ian Foster,et al.  Designing and building parallel programs , 1994 .

[38]  L. Demkowicz,et al.  Entropy Controlled Adaptive Finite Element Simulations for Compressible Gas Flow , 1996 .

[39]  Mitsuhisa Sato,et al.  NetCFD: a Ninf CFD component for global computing, and its Java applet GUI , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[40]  Pierre Kuonen,et al.  Parallel Computer Architectures for Commodity Computing , 1999 .

[41]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[42]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[43]  Krzysztof Banaś,et al.  Parallel h-adaptive simulations of inviscid flows by the finite element method , 1997 .

[44]  Luc Giraud,et al.  Some Investigations of Domain Decomposition Techniques in Parallel CFD , 1999, Euro-Par.

[45]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[46]  Jacek Kitowski,et al.  Implementation Issues of Computational Fluid Dynamics Algorithms on Parallel Computers , 1999, PVM/MPI.

[47]  Srinivas Aluru,et al.  Parallel domain decomposition and load balancing using space-filling curves , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[48]  Sharon Brunett,et al.  A test suite for high-performance parallel Java , 2000 .

[49]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[50]  George Karypis,et al.  Parmetis parallel graph partitioning and sparse matrix ordering library , 1997 .

[51]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .