Parallel Memory-Independent Communication Bounds for SYRK
暂无分享,去创建一个
[1] Olivier Beaumont,et al. Symmetric Block-Cyclic Distribution: Fewer Communications Leads to Faster Dense Cholesky Factorization , 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Hussam Al Daas,et al. Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds , 2022, SPAA.
[3] Olivier Beaumont,et al. I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels , 2022, SPAA.
[4] Alexandros Nikolaos Ziogas,et al. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] P. Sadayappan,et al. IOOpt: automatic derivation of I/O complexity bounds for affine programs , 2021, PLDI.
[6] Julien Langou,et al. Automated derivation of parametric data movement lower bounds for affine programs , 2019, PLDI.
[7] R. van de Geijn,et al. A Tight I/O Lower Bound for Matrix Multiplication , 2017 .
[8] James Demmel,et al. Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.
[9] James Demmel,et al. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[10] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[11] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.
[12] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[13] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[14] Yves Robert,et al. Revisiting Matrix Product on Master-Worker Platforms , 2006, 2007 IEEE International Parallel and Distributed Processing Symposium.
[15] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[16] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[17] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[18] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[19] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[20] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[21] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[22] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .