Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply
暂无分享,去创建一个
Katherine Yelick | James Demmel | Richard Vuduc | Rajesh Nishtala | J. Demmel | K. Yelick | R. Vuduc | R. Nishtala
[1] William Kahan,et al. Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum , 2001 .
[2] A. Snavely,et al. Modeling application performance by convolving machine signatures with application profiles , 2001, Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538).
[3] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[4] Francisco F. Rivera,et al. Modeling and Improving Locality for Irregular Problems: Sparse Matrix-Vector Product on Cache Memories as a Cache Study , 1999, HPCN Europe.
[5] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[6] Paul Vinson Stodghill,et al. A Relational Approach to the Automatic Generation of Sequential Sparse matrix Codes , 1997 .
[7] Aart J. C. Bik,et al. Automatic Nonzero Structure Analysis , 1999, SIAM J. Comput..
[8] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[9] P. Mannucci,et al. Abstract , 2003 .
[10] James Demmel,et al. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[11] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[12] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[13] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[14] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.
[15] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[16] Rafael Hector Saavedra-Barrera,et al. CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .
[17] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[18] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[19] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.