暂无分享,去创建一个
[1] Emmanuel Agullo,et al. Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[2] David A. Padua,et al. On the Automatic Parallelization of the Perfect Benchmarks , 1998, IEEE Trans. Parallel Distributed Syst..
[3] Frédéric Suter,et al. Mixed parallel implementations of the top level step of Strassen and Winograd matrix multiplication algorithms , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[4] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[5] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[6] V. Strassen. Gaussian elimination is not optimal , 1969 .
[7] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[8] Victor Y. Pan,et al. Strassen's algorithm is not optimal trilinear technique of aggregating, uniting and canceling for constructing fast algorithms for matrix operations , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).
[9] Michael A. Heroux,et al. GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm , 1994, Journal of Computational Physics.
[10] E. Kay,et al. Graph Theory. An Algorithmic Approach , 1975 .
[11] David J. Kuck,et al. On Stable Parallel Linear System Solvers , 1978, JACM.
[12] S. P. Kumar,et al. Solving Linear Algebraic Equations on an MIMD Computer , 1983, JACM.
[13] Arnold Schönhage,et al. Partial and Total Matrix Multiplication , 1981, SIAM J. Comput..
[14] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[15] Emmanuel Agullo,et al. A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures , 2011, Euro-Par.
[16] R. Clint Whaley,et al. Achieving accurate and context‐sensitive timing for code optimization , 2008, Softw. Pract. Exp..
[17] Julien Langou,et al. Parallel tiled QR factorization for multicore architectures , 2007, Concurr. Comput. Pract. Exp..
[18] Jack Dongarra,et al. Enhancing Parallelism of Tile QR Factorization for Multicore Architectures , 2010 .
[19] Yves Robert. The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm , 1991 .
[20] Yves Robert,et al. Complexity of parallel QR factorization , 1986, JACM.
[21] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[22] Robert A. van de Geijn,et al. Families of algorithms related to the inversion of a Symmetric Positive Definite matrix , 2008, TOMS.
[23] Yves Robert,et al. Tiled QR factorization algorithms , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[24] Jack Dongarra,et al. QR factorization for the Cell Broadband Engine , 2009, HiPC 2009.
[25] James Demmel,et al. the Parallel Computing Landscape , 2022 .
[26] Emmanuel Agullo,et al. Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures , 2010, VECPAR.
[27] Monica S. Lam,et al. Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.
[28] Herb Sutter,et al. A Fundamental Turn Toward Concurrency in Software , 2008 .
[29] U. B. Vemulapati,et al. QR Factorization , 2009, Encyclopedia of Optimization.
[30] Julien Langou,et al. A Critical Path Approach to Analyzing Parallelism of Algorithmic Variants. Application to Cholesky Inversion , 2010, ArXiv.
[31] Shmuel Winograd,et al. On multiplication of 2 × 2 matrices , 1971 .
[32] Henri Casanova,et al. Parallel Algorithms , 2019, Design and Analysis of Algorithms.
[33] James Demmel,et al. Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.
[34] Jesús Labarta,et al. CellSs: Making it easier to program the Cell Broadband Engine processor , 2007, IBM J. Res. Dev..
[35] Emmanuel Agullo,et al. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[36] Francesco Romani,et al. Some Properties of Disjoint Sums of Tensors Related to Matrix Multiplication , 1982, SIAM J. Comput..
[37] Jean-Guillaume Dumas,et al. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm , 2007, ISSAC '09.
[38] J. J. Modi,et al. An alternative givens ordering , 1984 .
[39] Yuefan Deng,et al. Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures☆ , 1995 .
[40] Grazia Lotti,et al. O(n2.7799) Complexity for n*n Approximate Matrix Multiplication , 1979, Inf. Process. Lett..
[41] Jack Dongarra,et al. Fully Dynamic Scheduler for Numerical Computing on Multicore Processors , 2009 .
[42] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[43] M. Cosnard,et al. Parallel QR decomposition of a rectangular matrix , 1986 .
[44] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[45] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[46] Nicholas J. Higham,et al. Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD , 2013, SIAM J. Sci. Comput..
[47] Yves Robert,et al. Optimal algorithms for Gaussian elimination on an MIMD computer , 1989, Parallel Comput..
[48] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.