Parallel Matrix Multiplication: A Systematic Journey
暂无分享,去创建一个
[1] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[2] Geoffrey C. Fox,et al. Solving problems on concurrent processors: vol. 2 , 1990 .
[3] James Demmel,et al. Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1 , 2013, ArXiv.
[4] Rob H. Bisseling,et al. Scientific Computing on Bulk Synchronous Parallel Architectures , 1994, IFIP Congress.
[5] Robert A. van de Geijn,et al. Distributed memory matrix-vector multiplication and conjugate gradient algorithms , 1993, Supercomputing '93. Proceedings.
[6] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[7] S. Huss-Lederman,et al. Comparison of scalable parallel matrix multiplication libraries , 1993, Proceedings of Scalable Parallel Libraries Conference.
[8] Ramesh C. Agarwal,et al. A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..
[9] W. Marsden. I and J , 2012 .
[10] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[11] Robert A. van de Geijn,et al. A flexible class of parallel matrix multiplication algorithms , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[12] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[13] Bruce Hendrickson,et al. The Torus-Wrap Mapping for Dense Matrix Calculations on Massively Parallel Computers , 1994, SIAM J. Sci. Comput..
[14] James Demmel,et al. Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.
[15] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[16] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[17] Rob H. Bisseling,et al. Parallel iterative solution of sparse linear systems on a transputer network , 1994 .
[18] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[19] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.
[20] G. W. Stewart. Communication and matrix computations on large message passing systems , 1990, Parallel Comput..
[21] Grey Ballard,et al. Avoiding Communication in Dense Linear Algebra , 2013 .
[22] Jack Dongarra,et al. LAPACK Users' Guide, 3rd ed. , 1999 .
[23] Jaeyoung Choi,et al. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..
[24] Guodong Zhang,et al. Matrix multiplication on the Intel Touchstone Delta , 1994, Concurr. Pract. Exp..
[25] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[26] G. C. Fox,et al. Solving Problems on Concurrent Processors , 1988 .
[27] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[28] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[29] Jehoshua Bruck,et al. Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.
[30] Robert A. van de Geijn,et al. The libflame Library for Dense Matrix Computations , 2009, Computing in Science & Engineering.
[31] R. V. D. Geijn,et al. Parallel Matrix Distributions: Have we been doing it all wrong? , 1995 .
[32] Anthony Skjellum,et al. A poly‐algorithm for parallel dense matrix multiplication on two‐dimensional process grid topologies , 1997 .
[33] Steven J. Plimpton,et al. An Efficient Parallel Algorithm for Matrix-Vector Multiplication , 1995, Int. J. High Speed Comput..
[34] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[35] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[36] Robert A. van de Geijn,et al. Using PLAPACK - parallel linear algebra package , 1997 .
[37] Martin D. Schatz. Anatomy of Parallel Computation with Tensors FLAME Working Note # 72 Ph , 2013 .