Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

The conjugate gradient (CG) algorithm is perhaps the best-known iterative technique for solving sparse linear systems that are symmetric and positive definite. For systems that are ill conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(0) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance on both distributed and distributed shared-memory systems, cache reuse may be more important than reducing communication, it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and a hybrid MPI + OpenMP paradigm increases programming complexity with little performance gain. A multithreaded implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread-level parallelism.

[1]  D. S. Henty,et al.  Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[2]  Rainald Loehner,et al.  Renumbering strategies for unstructured-grid solvers operating on shared-memory, cache-based parallel machines , 1997 .

[3]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[4]  Michael B. Giles,et al.  Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines , 1997 .

[5]  Mark T. Jones,et al.  A Parallel Graph Coloring Heuristic , 1993, SIAM J. Sci. Comput..

[6]  Mark T. Jones,et al.  BlockSolve95 users manual: Scalable library software for the parallel solution of sparse linear systems , 1995 .

[7]  Leonid Oliker,et al.  Parallelization of a Dynamic Unstructured Algorithm Using Three Leading Programming Paradigms , 2000, IEEE Trans. Parallel Distributed Syst..

[8]  Leonid Oliker,et al.  A Comparison of Three Programming Models for Adaptive Applications on the Origin2000 , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[9]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[10]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[11]  Geoffrey C. Fox,et al.  Fast and parallel mapping algorithms for irregular problems , 1996, The Journal of Supercomputing.

[12]  Rupak Biswas,et al.  Parallel Load Balancing for Adaptive Unstructured Meshes , 1998 .

[13]  Scott B. Baden,et al.  Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves , 1996, IEEE Trans. Parallel Distributed Syst..

[14]  James C. Browne,et al.  On partitioning dynamic adaptive grid hierarchies , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[15]  Dinesh Manocha,et al.  Applied Computational Geometry Towards Geometric Engineering , 1996, Lecture Notes in Computer Science.

[16]  Michael Griebel,et al.  Hash-Storage Techniques for Adaptive Multilevel Solvers and Their Domain Decomposition Parallelizati , 1998 .

[17]  Franck Cappello,et al.  MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[18]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[19]  J. A. George Computer implementation of the finite element method , 1971 .

[20]  Gerd Heber,et al.  Self-Avoiding Walks over Adaptive Unstructured Grids , 1999, Concurr. Pract. Exp..

[21]  Leonid Oliker,et al.  PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes , 1998, J. Parallel Distributed Comput..

[22]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[23]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .