Data distribution schemes of sparse arrays on distributed memory multicomputers

Abstract A data distribution scheme of sparse arrays on a distributed memory multicomputer, in general, is composed of three phases, data partition, data distribution, and data compression. To implement the data distribution scheme, many methods proposed in the literature first perform the data partition phase, then the data distribution phase, followed by the data compression phase. We called a data distribution scheme with this order as Send Followed Compress (SFC) scheme. In this paper, we propose two other data distribution schemes, Compress Followed Send (CFS) and Encoding-Decoding (ED), for sparse array distribution. In the CFS scheme, the data compression phase is performed before the data distribution phase. In the ED scheme, the data compression phase can be divided into two steps, encoding and decoding. The encoding step and the decoding step are performed before and after the data distribution phase, respectively. To evaluate the CFS and the ED schemes, we compare them with the SFC scheme. In the data partition phase, the row partition, the column partition, and the 2D mesh partition with/without load-balancing methods are used for these three schemes. In the compression phase, the CRS/CCS methods are used to compress sparse local arrays for the SFC and the CFS schemes while the encoding/decoding step is used for the ED scheme. Both theoretical analysis and experimental tests were conducted. In the theoretical analysis, we analyze the SFC, the CFS, and the ED schemes in terms of the data distribution time and the data compression time. In experimental tests, we implemented these three schemes on an IBM SP2 parallel machine. From the experimental results, for most of test cases, the CFS and the ED schemes outperform the SFC scheme. For the CFS and the ED schemes, the ED scheme outperforms the CFS scheme for all test cases.

[1]  Emilio L. Zapata,et al.  Sparse matrix block-cyclic redistribution , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[2]  Chun-Yuan Lin,et al.  Efficient Data Compression Methods for Multidimensional Sparse Array Operations Based on the EKMR Scheme , 2003, IEEE Trans. Computers.

[3]  Chun-Yuan Lin,et al.  Efficient Data Parallel Algorithms for Multidimensional Array Operations Based on the EKMR Scheme for Distributed Memory Multicomputers , 2003, IEEE Trans. Parallel Distributed Syst..

[4]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[5]  E. Zapata,et al.  Extending CRAFT Data-Distributions for Sparse Matrices , 1996 .

[6]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[7]  William H. Press,et al.  Numerical recipes in Fortran 90: the art of parallel scientific computing, 2nd Edition , 1996, Fortran numerical recipes.

[8]  Barbara M. Chapman,et al.  Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation , 1997, IEEE Trans. Parallel Distributed Syst..

[9]  Rafael Asenjo,et al.  HPF-2 Support for Dynamic Sparse Computations , 1998, LCPC.

[10]  K. Pingali,et al.  Compiling Parallel Code for Sparse Matrix Applications , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[11]  Jenq Kuen Lee,et al.  Towards Automatic Support of Parallel Sparse Computation in Java with Continuous Compilation , 1997, Concurr. Pract. Exp..

[12]  Barbara M. Chapman,et al.  New data-parallel language features for sparse matrix computations , 1995, Proceedings of 9th International Parallel Processing Symposium.

[13]  John G. Lewis,et al.  Sparse matrix test problems , 1982, SGNM.

[14]  Boleslaw K. Szymanski,et al.  Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines , 1994, PARLE.

[15]  Joel H. Saltz,et al.  Parallelization Techniques for Sparse Matrix Applications , 1996, J. Parallel Distributed Comput..

[16]  Chun-Yuan Lin,et al.  Data distribution schemes of sparse arrays on distributed memory multicomputers , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[17]  Rafael Asenjo,et al.  Sparse Block and Cyclic Data Distributions for Matrix Computations , 1995 .

[18]  P. Sadayappan,et al.  On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[19]  C.W. Kessler,et al.  The SPARAMAT approach to automatic comprehension of sparse matrix computations , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[20]  Brendan Vastenhouw,et al.  A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..

[21]  Chun-Yuan Lin,et al.  Efficient Representation Scheme for Multidimensional Array Operations , 2002, IEEE Trans. Computers.

[22]  Yeh-Ching Chung,et al.  Efficient parallel algorithms for multi-dimensional matrix operations , 2000, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN 2000.

[23]  Keshav Pingali,et al.  Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.

[24]  A. Pinar,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[25]  Jenq Kuen Lee,et al.  Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90 , 2001, The Journal of Supercomputing.

[26]  Rafael Asenjo,et al.  Data-parallel support for numerical irregular problems , 1999, Parallel Comput..

[27]  Shahid H. Bokhari,et al.  A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.

[28]  Kanad Ghose,et al.  Caching-efficient multithreaded fast multiplication of sparse matrices , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[29]  Jenq Kuen Lee,et al.  Support and optimization for parallel sparse programs with array intrinsics of Fortran 90 , 2004, Parallel Comput..