Performance Optimizations and Bounds for Sparse Matrix Kernels
暂无分享,去创建一个
Benjamin C. Lee | J. Demmel | K. Yelick | R. Vuduc | S. Kamil | R. Nishtala
[1] Michael F. P. O'Boyle,et al. The effect of cache models on iterative compilation for combined tiling and unrolling , 2004, Concurr. Comput. Pract. Exp..
[2] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[3] Larry Carter,et al. Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.
[4] Larry Carter,et al. A Modal Model of Memory , 2001, International Conference on Computational Science.
[5] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[6] Roldan Pozo,et al. NIST sparse BLAS user's guide , 2001 .
[7] Greg M. Henry,et al. Flexible High-Performance Matrix Multiply via a Self-Modifying Runtime Code , 2001 .
[8] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[9] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[10] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[11] Aart J. C. Bik,et al. Automatic Nonzero Structure Analysis , 1999, SIAM J. Comput..
[12] Francisco F. Rivera,et al. Modeling and Improving Locality for Irregular Problems: Sparse Matrix-Vector Product on Cache Memories as a Cache Study , 1999, HPCN Europe.
[13] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[14] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[15] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[16] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.
[17] Josep-Lluís Larriba-Pey,et al. Block algorithms for sparse matrix computations on high performance workstations , 1996, ICS '96.
[18] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[19] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[20] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.
[21] Rafael Hector Saavedra-Barrera,et al. CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .
[22] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.