The performance impact of data reuse in parallel dense Cholesky factorization
暂无分享,去创建一个
[1] Y. Saad,et al. Communication complexity of the Gaussian elimination algorithm on multiprocessors , 1986 .
[2] G. A. Geist,et al. Parallel Cholesky factorization on a hypercube multiprocessor , 1985 .
[3] A. George,et al. Parallel Cholesky factorization on a shared-memory multiprocessor. Final report, 1 October 1986-30 September 1987 , 1986 .
[4] Anoop Gupta,et al. Design of scalable shared-memory multiprocessors: the DASH approach , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.
[5] Jack Dongarra,et al. LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.
[6] G. Stewart,et al. Assignment and scheduling in parallel matrix factorization , 1986 .
[7] William Jalby,et al. Impact of Hierarchical Memory Systems On Linear Algebra Algorithm Design , 1988 .
[8] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[9] Vijay K. Naik,et al. Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems , 1989, ICS '89.
[10] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[11] Anoop Gupta,et al. Techniques for improving the performance of sparse matrix factorization on multiprocessor workstations , 1990, Proceedings SUPERCOMPUTING '90.
[12] J. Dongarra. Performance of various computers using standard linear equations software , 1990, CARN.