Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply
暂无分享,去创建一个
James Demmel | Richard W. Vuduc | Katherine A. Yelick | Shoaib Kamil | Benjamin C. Lee | Rajesh Nishtala | Benjamin C. Lee | J. Demmel | K. Yelick | R. Vuduc | S. Kamil | R. Nishtala
[1] Anne Lohrli. Chapman and Hall , 1985 .
[2] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[3] Rafael Hector Saavedra-Barrera,et al. CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .
[4] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[5] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.
[6] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[7] Josep-Lluís Larriba-Pey,et al. Block algorithms for sparse matrix computations on high performance workstations , 1996, ICS '96.
[8] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.
[9] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[10] Paul Vinson Stodghill,et al. A Relational Approach to the Automatic Generation of Sequential Sparse matrix Codes , 1997 .
[11] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..
[12] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[13] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[14] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[15] Jeremy G. Siek,et al. A Rational Approach to Portable High Performance: The Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library , 1998, ECOOP Workshops.
[16] Todd L. Veldhuizen,et al. Arrays in Blitz++ , 1998, ISCOPE.
[17] William Pugh,et al. SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations , 1998, LCPC.
[18] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[19] Roman Geus,et al. Towards a fast parallel sparse matrix-vector multiplication , 2000, PARCO.
[20] Francisco F. Rivera,et al. Modeling and Improving Locality for Irregular Problems: Sparse Matrix-Vector Product on Cache Memories as a Cache Study , 1999, HPCN Europe.
[21] A. Pinar,et al. Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[22] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[23] Emilio L. Zapata,et al. Memory Hierarchy Performance Prediction for Blocked Sparse Algorithms , 1999, Parallel Process. Lett..
[24] Aart J. C. Bik,et al. Automatic Nonzero Structure Analysis , 1999, SIAM J. Comput..
[25] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[26] Dragan Mirkovic,et al. An adaptive software library for fast Fourier transforms , 2000, ICS '00.
[27] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[28] David E. Keyes,et al. Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .
[29] Michele Colajanni,et al. PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.
[30] José M. F. Moura,et al. Fast Automatic Generation of DSP Algorithms , 2001, International Conference on Computational Science.
[31] Roldan Pozo,et al. NIST sparse BLAS user's guide , 2001 .
[32] Greg M. Henry,et al. Flexible High-Performance Matrix Multiply via a Self-Modifying Runtime Code , 2001 .
[33] William Kahan,et al. Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum , 2001 .
[34] P. Mannucci,et al. Abstract , 2003 .
[35] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[36] Sathish S. Vadhiyar,et al. Towards an Accurate Model for Collective Communications , 2001, Int. J. High Perform. Comput. Appl..
[37] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[38] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[39] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[40] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.