A Study on Load Imbalance in Parallel Hypermatrix Multiplication Using OpenMP

In this paper we present our work on the the parallelization of a matrix multiplication code based on the hypermatrix data structure. We have used OpenMP for the parallelization. We have added OpenMP directives to a few loops and experimented with several features available with OpenMP in the Intel Fortran Compiler: scheduling algorithms, chunk sizes and nested parallelism. We found that the load imbalance introduced by the hypermatrix structure could not be solved by any of those OpenMP features.

[1]  G. Fuchs,et al.  Hypermatrix solution of large sets of symmetric positive-definite linear equations , 1972 .

[2]  Ahmed K. Noor,et al.  Hypermatrix scheme for finite element systems on CDC STAR-100 computer , 1975 .

[3]  David S. Wise Representing Matrices as Quadtrees for Parallel Processors , 1985, Inf. Process. Lett..

[4]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[5]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[6]  Jeremy D. Frens,et al.  Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.

[7]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[8]  Mithuna Thottethodi,et al.  Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[9]  Ken Kennedy,et al.  Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[10]  Mithuna Thottethodi,et al.  Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.

[11]  David S. Wise Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.

[12]  Anthony Skjellum,et al.  A framework for high‐performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low‐level kernels , 2002, Concurr. Comput. Pract. Exp..

[13]  Juan J. Navarro,et al.  Automatic Benchmarking and Optimization of Codes: An Experience with Numerical Kernels , 2003, Software Engineering Research and Practice.

[14]  Juan J. Navarro,et al.  Improving Performance of Hypermatrix Cholesky Factorization , 2003, Euro-Par.