Managing the complexity of lookahead for LU factorization with pivoting
暂无分享,去创建一个
[1] Robert A. van de Geijn,et al. Design of scalable dense linear algebra libraries for multithreaded architectures: the LU factorization , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[2] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[3] Robert A. van de Geijn,et al. Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.
[4] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[5] David A. Padua,et al. Programming with tiles , 2008, PPOPP.
[6] Robert A. van de Geijn,et al. Updating an LU Factorization with Pivoting , 2008, TOMS.
[7] P. Strazdins. A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization , 1998 .
[8] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[9] James Demmel,et al. Communication Avoiding Gaussian elimination , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] James Demmel,et al. An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination , 1997, SIAM J. Matrix Anal. Appl..
[11] Robert A. van de Geijn,et al. An API for Manipulating Matrices Stored by Blocks ∗ Tze Meng Low , 2004 .
[12] Ernie Chan,et al. Runtime Data Flow Scheduling of Matrix Computations FLAME Working Note # 39 , 2009 .
[13] Robert A. van de Geijn,et al. Representing linear algebra algorithms in code: the FLAME application program interfaces , 2005, TOMS.
[14] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[15] Robert A. van de Geijn,et al. SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks , 2008, PPoPP.
[16] Fred G. Gustavson,et al. New Generalized Matrix Data Structures Lead to a Variety of High-Performance Algorithms , 2000, The Architecture of Scientific Software.
[17] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[18] Katherine A. Yelick,et al. Multi-threading and one-sided communication in parallel LU factorization , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[19] Robert A. van de Geijn,et al. Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[20] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[21] Julien Langou,et al. Parallel tiled QR factorization for multicore architectures , 2007, Concurr. Comput. Pract. Exp..
[22] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[23] Jack Dongarra,et al. LINPACK Users' Guide , 1987 .
[24] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[25] Apostolos Gerasoulis,et al. Scheduling Linear Algebra Parallel Algorithms on MIMD Architectures , 1989, PPSC.
[26] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[27] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[28] Cliff Addison,et al. OpenMP issues arising in the development of parallel BLAS and LAPACK libraries , 2003, Sci. Program..
[29] Jack Dongarra,et al. LAPACK Users' guide (third ed.) , 1999 .