Avoiding Communication in Computing Krylov Subspaces
暂无分享,去创建一个
[1] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[2] John Van Rosendale. Minimizing Inner Product Data Dependencies in Conjugate Gradient Iteration , 1983, ICPP.
[3] Y. Saad,et al. Practical Use of Polynomial Preconditionings for the Conjugate Gradient Method , 1985 .
[4] J K Peir. Program partitioning and synchronization on multiprocessor systems , 1986 .
[5] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[6] Anthony T. Chronopoulos,et al. s-step iterative methods for symmetric linear systems , 1989 .
[7] E. Sturler. A PARALLEL VARIANT OF GMRES(m) , 1991 .
[8] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[9] Sivan Toledo,et al. Efficient out-of-core algorithms for linear relaxation using blocking covers , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.
[10] L. Reichel,et al. A Newton basis GMRES implementation , 1994 .
[11] Sivan Toledo,et al. Quantitative performance modeling of scientific computations and creating locality in numerical algorithms , 1995 .
[12] Frank Thomson Leighton,et al. Automatic methods for hiding latency in high bandwidth networks (extended abstract) , 1996, STOC '96.
[13] F. P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.
[14] William W. Pugh,et al. Fine-grained analysis of array computations , 1998 .
[15] John D. McCalpin,et al. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality , 1999 .
[16] Franco P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.
[17] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[18] W. Hackbusch,et al. On H2-Matrices , 2000 .
[19] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[20] Ulrich Rüde,et al. Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .
[21] Larry Carter,et al. Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.
[22] 正人 木村. Max-Planck-Institute for Mathematics in the Sciences(海外,ラボラトリーズ) , 2001 .
[23] Chris H. Q. Ding,et al. A ghost cell expansion method for reducing communications in solving PDE problems , 2001, SC.
[24] Wolfgang Hackbusch,et al. Construction and Arithmetics of H-Matrices , 2003, Computing.
[25] Katherine Yelick,et al. UPC: Distributed Shared-Memory Programming , 2003 .
[26] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[27] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.
[28] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[29] David A. Patterson,et al. Latency Lags Bandwidth , 2005, ICCD.
[30] Telecommunications Board,et al. Getting Up to Speed: The Future of Supercomputing , 2005 .
[31] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[32] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..