Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
暂无分享,去创建一个
[1] Ken Kennedy,et al. A model and compilation strategy for out-of-core data parallel programs , 1995, PPOPP '95.
[2] Evan J. Englund. Matrix Inversion using Quadtrees Implemented in Gofer , 1995 .
[3] James Demmel,et al. Stability of block algorithms with fast level-3 BLAS , 1992, TOMS.
[4] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[5] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[6] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.
[7] Patrick C. Fischer,et al. Storage reorganization techniques for matrix computation in a paging environment , 1979, CACM.
[8] Donald E. Knuth,et al. The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .
[9] David C. Cann,et al. Retire Fortran?: a debate rekindled , 1992, CACM.
[10] F. Warren Burton,et al. Comment on 'the explicit quad tree as a structure for computer graphics , 1983 .
[11] David S. Wise. Undulant-Block Elimination and Integer-Preserving Matrix Inversion , 1999, Sci. Comput. Program..
[12] Paul Hudak,et al. A gentle introduction to Haskell , 1992, SIGP.
[13] Nicholas J. Higham,et al. Exploiting fast matrix multiplication within the level 3 BLAS , 1990, TOMS.
[14] David S. Wise. Representing matrices as quadtrees for parallel processors: extended abstract , 1984, SIGS.
[15] David C. Cann,et al. Retire Fortran? A debate rekindled , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[16] Steve Carr,et al. Compiler blockability of dense matrix factorizations , 1997, TOMS.
[17] V. Strassen. Gaussian elimination is not optimal , 1969 .
[18] Donald E. Knuth. The art of computer programming: fundamental algorithms , 1969 .
[19] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[20] Richard J. Fateman. Symbolic mathematics system evaluators (extended abstract) , 1996, ISSAC '96.
[21] Edward G. Coffman,et al. Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.
[22] Guy L. Steele. Debunking the “expensive procedure call” myth or, procedure call implementations considered harmful or, LAMBDA: The Ultimate GOTO , 1977, ACM '77.
[23] K. A. Gallivan,et al. Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..
[24] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .
[25] David S. Wise. Representing Matrices as Quadtrees for Parallel Processors , 1985, Inf. Process. Lett..