AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our method''s insensitivity to relative row size, it is better suited than the Ellpack/Itpack or the Jagged Diagonal algorithms for matrices which have a varying number of non-zero elements in each row. Furthermore, our approach requires less preprocessing (no more time than a single sparse matrix-vector multiplication), less auxiliary storage, and uses a more convenient data representation (an augmented form of the standard compressed sparse row format). We have implemented our algorithm (SEGMV) on the Cray Y-MP C90, and have compared its performance with other methods on a variety of sparse matrices from the Harwell-Boeing collection and industrial application codes. Our performance on the test matrices is up to 3 times faster than the Jagged Diagonal algorithm and up to 5 times faster than Ellpack/Itpack method. Our preprocessing time is an order of magnitude faster than for the Jagged Diagonal algorithm. Also, using an assembly language implementation of SEGMV on a 16 processor C90, the NAS Conjugate Gradient benchmark runs at 3.5 gigaflops.

[1]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[2]  Jocelyne Erhel Sparse Matrix Multiplication on Vector Computers , 1990, Int. J. High Speed Comput..

[3]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[4]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[5]  Rami G. Melhem,et al.  Parallel solution of linear systems with striped sparse matrices , 1988, Parallel Comput..

[6]  Larry S. Davis,et al.  Efficient Parallel Processing of Image Contours , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[8]  Stavros A. Zenios,et al.  Data structures for network algorithms on massively parallel architectures , 1992, Parallel Comput..

[9]  Thomas J. Sheffler,et al.  Implementing the multiprefix operation on parallel and vector computers , 1993, SPAA '93.

[10]  Guy E. Blelloch,et al.  Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.

[11]  David H. Bailey,et al.  NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.

[12]  Guy E. Blelloch,et al.  Solving Linear Recurrences with Loop Raking , 1995, J. Parallel Distributed Comput..

[13]  Thomas C. Oppe,et al.  Recent vectorization and parallelization of ITPACKV , 1991 .

[14]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[15]  Michael Allen Heroux,et al.  A proposal for a sparse blas toolkit , 1992 .

[16]  Ramesh C. Agarwal,et al.  A high performance algorithm using pre-processing for the sparse matrix-vector multiplication , 1992, Proceedings Supercomputing '92.

[17]  G. V. Paolini,et al.  Data structures to vectorize CG algorithms for general sparsity patterns , 1989 .

[18]  A. Peters Sparse matrix vector multiplication techniques on the IBM 3090 VF , 1991, Parallel Comput..

[19]  Gary Demos,et al.  3D Image Synthesis on the Connection Machine , 1989, Int. J. High Speed Comput..

[20]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[21]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[22]  Siddhartha Chatterjee Compiling data-parallel programs for efficient execution on shared-memory multiprocessors , 1992 .

[23]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[24]  Iain S. Duff,et al.  Sparse matrix test problems , 1982 .

[25]  Wilfried Oed Cray Y-MP C90: System features and early benchmark results (Short communication) , 1992, Parallel Comput..

[26]  Guy E. Blelloch,et al.  Network Learning on the Connection Machine , 1987, IJCAI.

[27]  P. Girdinio,et al.  A new storage scheme for an efficient implementation of the sparse matrix-vector product , 1989, Parallel Comput..

[28]  David G. Messerschmitt,et al.  Three-dimensional finite-element analyses: implications for computer architectures , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[29]  L. W. Tucker,et al.  Object recognition using the Connection Machine , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Guy E. Blelloch,et al.  Solving linear recurrences with loop raking , 1992, Proceedings Sixth International Parallel Processing Symposium.