Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination
暂无分享,去创建一个
Jean-Guillaume Dumas | Clément Pernet | Thierry Gautier | Jean-Louis Roch | Ziad Sultan | T. Gautier | J. Dumas | Clément Pernet | Jean-Louis Roch | Ziad Sultan
[1] Jack Dongarra,et al. Numerical Linear Algebra for High-Performance Computers , 1998 .
[2] Alexandru Nicolau,et al. Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: Matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation , 2011, TOMS.
[3] Claude-Pierre Jeannerod,et al. Rank-profile revealing Gaussian elimination and the CUP matrix decomposition , 2011, J. Symb. Comput..
[4] Jean-Guillaume Dumas,et al. Dense Linear Algebra over Word-Size Prime Fields: the FFLAS and FFPACK Packages , 2006, TOMS.
[5] Jean-Guillaume Dumas,et al. Finite field linear algebra subroutines , 2002, ISSAC '02.
[6] Austin R. Benson,et al. A framework for practical parallel fast matrix multiplication , 2014, PPoPP.
[7] Jack J. Dongarra,et al. A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.
[8] David H. Bailey,et al. Using Strassen's algorithm to accelerate the solution of linear systems , 1991, The Journal of Supercomputing.
[9] Luk Bettale,et al. Cryptanalysis of HFE, multi-HFE and variants for odd and even characteristic , 2012, Designs, Codes and Cryptography.
[10] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[11] Thierry Gautier,et al. libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms , 2012, IWOMP.
[12] Masha Sosonkina,et al. Locality Optimization on a NUMA Architecture for Hybrid LU Factorization , 2013, PARCO.
[13] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[14] Daniel E. Steffy,et al. Konrad-zuse-zentrum F ¨ Ur Informationstechnik Berlin Improving the Accuracy of Linear Programming Solvers with Iterative Refinement Improving the Accuracy of Linear Programming Solvers with Iterative Refinement * , 2022 .
[15] R. Gregory Taylor,et al. Modern computer algebra , 2002, SIGA.
[16] Z. Zlatev. Computational Methods for General Sparse Matrices , 1991 .
[17] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[18] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[19] Jean-Guillaume Dumas,et al. Simultaneous computation of the row and column rank profiles , 2013, ISSAC '13.
[20] P. Sadayappan,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.
[21] V. Strassen. Gaussian elimination is not optimal , 1969 .
[22] J. Faugère. A new efficient algorithm for computing Gröbner bases (F4) , 1999 .
[23] Jack J. Dongarra,et al. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting , 2014, Concurr. Comput. Pract. Exp..
[24] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[25] James Demmel,et al. Communication-Avoiding Parallel Strassen: Implementation and performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] William Stein,et al. Modular forms, a computational approach , 2007 .
[27] Jean-Guillaume Dumas,et al. Computing the Rank Profile Matrix , 2015, ISSAC.
[28] Robert A. van de Geijn,et al. Anatomy of a Parallel Out-of-Core Dense Linear Solver , 1995, ICPP.
[29] Jean-Guillaume Dumas,et al. Parallel Computation of Echelon Forms , 2014, Euro-Par.