The PHiPAC v1.0 Matrix-Multiply Distribution
暂无分享,去创建一个
[1] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[2] Ed Anderson,et al. LAPACK users' guide - [release 1.0] , 1992 .
[3] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[4] Jack J. Dongarra,et al. A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.
[5] Gene H. Golub,et al. Matrix computations , 1983 .
[6] Chandrika Kamath,et al. DXML: A High-performance Scientific Subroutine Library , 1994, Digit. Tech. J..
[7] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[8] Bo Kågström,et al. Portable High Performance GEMM-Based Level 3 BLAS , 1993, PPSC.
[9] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[10] D LamMonica,et al. The cache performance and optimizations of blocked algorithms , 1991 .
[11] Jacqueline Chame,et al. The combined effectiveness of unimodular transformations, tiling, and software prefetching , 1996, Proceedings of International Conference on Parallel Processing.
[12] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.
[13] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[14] Bowen Alpern,et al. Space-limited procedures: a methodology for portable high-performance , 1995, Programming Models for Massively Parallel Computers.
[15] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.
[16] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.