Optimizing the Performance of Sparse Matrix-Vector Multiplication
暂无分享,去创建一个
[1] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[2] Yousef Saad,et al. P-Sparslib: A Portable Library Of Distributed Memory Sparse Iterative Solvers , 1995, PACT 1995.
[3] Yves Robert,et al. Static tiling for heterogeneous computing platforms , 1999, Parallel Comput..
[4] M. Sadkane. A block Arnoldi-Chebyshev method for computing the leading eigenpairs of large sparse unsymmetric matrices , 1993 .
[5] William Gropp,et al. PETSc 2.0 users manual , 2000 .
[6] Jitendra Malik,et al. Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
[7] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .
[8] Xiaoye Sherry Li,et al. Sparse Gaussian Elimination on High Performance Computers , 1996 .
[9] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[10] E. Im,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.
[11] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[12] Steven Mark Carr,et al. Memory-hierarchy management , 1993 .
[13] Bo Kågström,et al. Portable High Performance GEMM-Based Level 3 BLAS , 1993, PPSC.
[14] J. G. Lewis,et al. A Shifted Block Lanczos Algorithm for Solving Sparse Symmetric Generalized Eigenproblems , 1994, SIAM J. Matrix Anal. Appl..
[15] Steve Carr,et al. Compiler blockability of dense matrix factorizations , 1997, TOMS.
[16] J. R. Johnson,et al. Implementation of Strassen's Algorithm for Matrix Multiplication , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[17] Keshav Pingali,et al. An experimental evaluation of tiling and shackling for memory hierarchy management , 1999, ICS '99.
[18] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.
[19] John N. Shadid,et al. Aztec user`s guide. Version 1 , 1995 .
[20] Jack Dongarra,et al. Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.
[21] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..
[22] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[23] Roldan Pozo,et al. Template Numerical Toolkit for Linear Algebra: High Performance Programming With C++ and the Standard Template Library , 1997, Int. J. High Perform. Comput. Appl..
[24] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[25] K. Pingali,et al. Compiling Parallel Code for Sparse Matrix Applications , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[26] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[27] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[28] Vipin Kumar,et al. Hmetis: a hypergraph partitioning package , 1998 .
[29] A. H. Sherman,et al. Comparative Analysis of the Cuthill–McKee and the Reverse Cuthill–McKee Ordering Algorithms for Sparse Matrices , 1976 .
[30] Karim Esseghir. Improving data locality for caches , 1993 .
[31] Alejandro L. Garcia,et al. Scientific Computing, An Introductory Survey , 1998 .
[32] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[33] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[34] Mark T. Jones,et al. BlockSolve95 users manual: Scalable library software for the parallel solution of sparse linear systems , 1995 .
[35] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[36] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[37] Matteo Frigo. A Fast Fourier Transform Compiler , 1999, PLDI.
[38] David R. O'Hallaron. Spark98: Sparse Matrix Kernels for Shared Memory and Message Passing Systems , 1997 .
[39] Leonid Oliker,et al. Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems , 2000, IPDPS Workshops.
[40] Aart J. C. Bik,et al. Compiler support for sparse matrix computations , 1996 .
[41] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[42] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[43] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[44] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[45] Bruce Hendrickson,et al. A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[46] M. Sadkane. Block-Arnoldi and Davidson methods for unsymmetric large eigenvalue problems , 1993 .
[47] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[48] C. L. Lawson,et al. Development of a block Lanczos algorithm for free vibration analysis of spinning structures , 1988 .
[49] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[50] Mithuna Thottethodi,et al. Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[51] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[52] A. Lumsdaine,et al. A Sparse Matrix Library in C + + for High PerformanceArchitectures , 1994 .
[53] Steve Carr,et al. Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.