Avoiding communication in sparse matrix computations
暂无分享,去创建一个
James Demmel | Katherine A. Yelick | Mark Hoemmen | Marghoob Mohiyuddin | J. Demmel | K. Yelick | M. Hoemmen | M. Mohiyuddin
[1] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[2] John Van Rosendale. Minimizing Inner Product Data Dependencies in Conjugate Gradient Iteration , 1983, ICPP.
[3] Y. Saad,et al. Practical Use of Polynomial Preconditionings for the Conjugate Gradient Method , 1985 .
[4] J K Peir. Program partitioning and synchronization on multiprocessor systems , 1986 .
[5] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[6] Anthony T. Chronopoulos,et al. s-step iterative methods for symmetric linear systems , 1989 .
[7] R Vichnevetsky,et al. IMACS '91: Proceedings of the IMACS World Congress on Computation and Applied Mathematics (13th) Held in Dublin, Ireland on July 22-26, 1991. Volume 2. Computational Fluid Dynamics and Wave Propagation, Parallel Computing, Concurrent and Supercomputing, Computational Physics/Computational Chemistry , 1991 .
[8] E. Sturler. A PARALLEL VARIANT OF GMRES(m) , 1991 .
[9] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[10] Sivan Toledo,et al. Efficient out-of-core algorithms for linear relaxation using blocking covers , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.
[11] L. Reichel,et al. A Newton basis GMRES implementation , 1994 .
[12] Sivan Toledo,et al. Quantitative performance modeling of scientific computations and creating locality in numerical algorithms , 1995 .
[13] William W. Pugh,et al. Fine-grained analysis of array computations , 1998 .
[14] John D. McCalpin,et al. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality , 1999 .
[15] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[16] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[17] Ulrich Rüde,et al. Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .
[18] Larry Carter,et al. Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.
[19] Chris H. Q. Ding,et al. A ghost cell expansion method for reducing communications in solving PDE problems , 2001, SC.
[20] Katherine Yelick,et al. UPC: Distributed Shared-Memory Programming , 2003 .
[21] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[22] David A. Patterson,et al. Latency lags bandwith , 2004, CACM.
[23] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.
[24] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[25] David A. Patterson,et al. Latency Lags Bandwidth , 2005, ICCD.
[26] Telecommunications Board,et al. Getting Up to Speed: The Future of Supercomputing , 2005 .
[27] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[28] J. Demmel,et al. Avoiding Communication in Computing Krylov Subspaces , 2007 .
[29] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..