A Study on Load Imbalance in Parallel Hypermatrix Multiplication Using OpenMP
暂无分享,去创建一个
[1] G. Fuchs,et al. Hypermatrix solution of large sets of symmetric positive-definite linear equations , 1972 .
[2] Ahmed K. Noor,et al. Hypermatrix scheme for finite element systems on CDC STAR-100 computer , 1975 .
[3] David S. Wise. Representing Matrices as Quadtrees for Parallel Processors , 1985, Inf. Process. Lett..
[4] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[5] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.
[6] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[7] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[8] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[9] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[10] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[11] David S. Wise. Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.
[12] Anthony Skjellum,et al. A framework for high‐performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low‐level kernels , 2002, Concurr. Comput. Pract. Exp..
[13] Juan J. Navarro,et al. Automatic Benchmarking and Optimization of Codes: An Experience with Numerical Kernels , 2003, Software Engineering Research and Practice.
[14] Juan J. Navarro,et al. Improving Performance of Hypermatrix Cholesky Factorization , 2003, Euro-Par.