论文信息 - A Study on Load Imbalance in Parallel Hypermatrix Multiplication Using OpenMP

A Study on Load Imbalance in Parallel Hypermatrix Multiplication Using OpenMP

In this paper we present our work on the the parallelization of a matrix multiplication code based on the hypermatrix data structure. We have used OpenMP for the parallelization. We have added OpenMP directives to a few loops and experimented with several features available with OpenMP in the Intel Fortran Compiler: scheduling algorithms, chunk sizes and nested parallelism. We found that the load imbalance introduced by the hypermatrix structure could not be solved by any of those OpenMP features.

Juan J. Navarro | José R. Herrero | J. Herrero | J. Navarro

[1] G. Fuchs,et al. Hypermatrix solution of large sets of symmetric positive-definite linear equations , 1972 .

[2] Ahmed K. Noor,et al. Hypermatrix scheme for finite element systems on CDC STAR-100 computer , 1975 .

[3] David S. Wise. Representing Matrices as Quadtrees for Parallel Processors , 1985, Inf. Process. Lett..

[4] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[5] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[6] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.

[7] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[8] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[9] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[10] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.

[11] David S. Wise. Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.

[12] Anthony Skjellum,et al. A framework for high‐performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low‐level kernels , 2002, Concurr. Comput. Pract. Exp..

[13] Juan J. Navarro,et al. Automatic Benchmarking and Optimization of Codes: An Experience with Numerical Kernels , 2003, Software Engineering Research and Practice.

[14] Juan J. Navarro,et al. Improving Performance of Hypermatrix Cholesky Factorization , 2003, Euro-Par.