Automatic Performance Tuning and Analysis of Sparse Triangular Solve
暂无分享,去创建一个
Katherine Yelick | James Demmel | Rajesh Nishtala | Richard Vuduc Shoaib | Kamil Jen Hsu | J. Demmel | K. Yelick | R. Nishtala
[1] Stefan Andersson,et al. RS/6000 Scientific and Technical Computing: POWER3 Introduction and Tuning Guide , 1998 .
[2] Edward Rothberg,et al. Alternatives for Solving Sparse Triangular Systems on Distributed-Memory Multiprocessors , 1995, Parallel Comput..
[3] Fred G. Gustavson,et al. LAWRA: Linear Algebra with Recursive Algorithms , 2000, PARA.
[4] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[5] William Kahan,et al. Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum , 2001 .
[6] Patrick Amestoy,et al. A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..
[7] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[8] Todd L. Veldhuizen,et al. Arrays in Blitz++ , 1998, ISCOPE.
[9] Dragan Mirkovic,et al. An adaptive software library for fast Fourier transforms , 2000, ICS '00.
[10] Michael T. Heath,et al. Performance of Parallel Sparse Triangular Solution , 1999 .
[11] Sathish S. Vadhiyar,et al. Towards an Accurate Model for Collective Communications , 2001, Int. J. High Perform. Comput. Appl..
[12] Aart J. C. Bik,et al. Automatic Nonzero Structure Analysis , 1999, SIAM J. Comput..
[13] Rafael Hector Saavedra-Barrera,et al. CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .
[14] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[15] James Demmel,et al. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[16] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.
[17] Roldan Pozo,et al. NIST sparse BLAS user's guide , 2001 .
[18] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[19] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[20] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[21] P. Mannucci,et al. Abstract , 2003 .
[22] Francisco F. Rivera,et al. Modeling and Improving Locality for Irregular Problems: Sparse Matrix-Vector Product on Cache Memories as a Cache Study , 1999, HPCN Europe.
[23] Anoop Gupta,et al. Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck , 1990, Parallel Comput..
[24] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[25] Vipin Kumar,et al. PSPASES: An Efficient and Scalable Parallel Sparse Direct Solver , 1999, PPSC.
[26] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[27] Padma Raghavan,et al. Efficient Parallel Sparse Triangular Solution Using Selective Inversion , 1998, Parallel Process. Lett..
[28] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[29] Paul Vinson Stodghill,et al. A Relational Approach to the Automatic Generation of Sequential Sparse matrix Codes , 1997 .
[30] Cleve Ashcraft,et al. SPOOLES: An Object-Oriented Sparse Matrix Library , 1999, PPSC.
[31] Timothy A. Davis,et al. An Unsymmetric-pattern Multifrontal Method for Sparse Lu Factorization , 1993 .
[32] Michele Colajanni,et al. PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.
[33] Vipin Kumar,et al. Parallel Algorithms for Forward Elimination and Backward Substitution in Direct Solution of Sparse L , 1995 .
[34] Fernando L. Alvarado,et al. Optimal Parallel Solution of Sparse Triangular Systems , 1993, SIAM J. Sci. Comput..
[35] Josep-Lluís Larriba-Pey,et al. Block algorithms for sparse matrix computations on high performance workstations , 1996, ICS '96.
[36] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[37] Jeremy G. Siek,et al. A Rational Approach to Portable High Performance: The Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library , 1998, ECOOP Workshops.
[38] José M. F. Moura,et al. Fast Automatic Generation of DSP Algorithms , 2001, International Conference on Computational Science.
[39] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[40] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[41] A. Gupta,et al. Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[42] James Demmel,et al. A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..
[43] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[44] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[45] Eunice E. Santos. Solving triangular linear systems in parallel using substitution , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[46] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[47] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[48] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.
[49] Thomas F. Coleman,et al. A parallel triangular solver for distributed-memory multiprocessor , 1988 .
[50] Vipin Kumar,et al. A high performance two dimensional scalable parallel algorithm for solving sparse triangular systems , 1997, Proceedings Fourth International Conference on High-Performance Computing.