On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties
暂无分享,去创建一个
Jack Dongarra | Piotr Luszczek | Ichitaro Yamazaki | Jakub Kurzak | Mathieu Faverge | Mark Gates | Simplice Donfack | J. Dongarra | P. Luszczek | I. Yamazaki | J. Kurzak | M. Gates | Mathieu Faverge | Simplice Donfack
[1] J. Hess,et al. Calculation of potential flow about arbitrary bodies , 1967 .
[2] Jack J. Dongarra,et al. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[3] Danny C. Sorensen,et al. Analysis of Pairwise Pivoting in Gaussian Elimination , 1985, IEEE Transactions on Computers.
[4] Jack J. Dongarra,et al. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction , 2011, PPAM.
[5] Jack J. Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..
[6] J. Hess. Panel Methods in Computational Fluid Dynamics , 1990 .
[7] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[8] R. Aymar,et al. Overview of ITER-FEAT - The future international burning plasma experiment , 2001 .
[9] R. Harrington. Origin and development of the method of moments for field computation , 1990, IEEE Antennas and Propagation Magazine.
[10] Lars Karlsson,et al. Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion , 2012, TOMS.
[11] E. L. Yip,et al. FORTRAN subroutines for out-of-core solutions of large complex linear systems , 1979 .
[12] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[13] A Randomizing Butterfly Transformation Useful in Block Matrix Computations , 1995 .
[14] Jack J. Dongarra,et al. Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures , 2010, IEEE Transactions on Parallel and Distributed Systems.
[15] Jack J. Dongarra,et al. A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[16] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[17] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[18] L. Foster. Gaussian Elimination with Partial Pivoting Can Fail in Practice , 1994, SIAM J. Matrix Anal. Appl..
[19] James Demmel,et al. Communication avoiding Gaussian elimination , 2008, HiPC 2008.
[20] David Smithe,et al. Global-wave solutions with self-consistent velocity distributions in ion cyclotron heated plasmas , 2006 .
[21] Emmanuel Agullo,et al. LU factorization for accelerator-based systems , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).
[22] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[23] Jack J. Dongarra,et al. Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[24] L. Trefethen,et al. Average-case stability of Gaussian elimination , 1990 .
[25] Richard F. Barrett,et al. Complex version of high performance computing LINPACK benchmark (HPL) , 2010 .
[26] Jack J. Dongarra,et al. Exploiting Fine-Grain Parallelism in Recursive LU Factorization , 2011, PARCO.
[27] Joseph F. Grcar,et al. Mathematicians of Gaussian Elimination , 2011 .
[28] Gene H. Golub,et al. Matrix computations , 1983 .
[29] Jack J. Dongarra,et al. Accelerating Linear System Solutions Using Randomization Techniques , 2013, TOMS.
[30] Jack J. Dongarra,et al. High performance matrix inversion based on LU factorization for multicore architectures , 2011, MTAGS '11.
[31] James Demmel,et al. Error bounds from extra-precise iterative refinement , 2006, TOMS.
[32] E D'Azevedo,et al. Sheared poloidal flow driven by mode conversion in tokamak plasmas. , 2003, Physical review letters.
[33] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[34] Mei Han An,et al. accuracy and stability of numerical algorithms , 1991 .
[35] Jack J. Dongarra,et al. High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures , 2013, TOMS.
[36] J. H. Wilkinson. The algebraic eigenvalue problem , 1966 .
[37] Emmanuel Agullo,et al. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[38] Julien Langou,et al. The Impact of Multicore on Math Software , 2006, PARA.
[39] Johnson J. H. Wang. Generalized Moment Methods in Electromagnetics: Formulation and Computer Solution of Integral Equations , 1991 .
[40] Cleve B. Moler,et al. Iterative Refinement in Floating Point , 1967, JACM.
[41] Laura Grigori,et al. Adapting communication-avoiding LU and QR factorizations to multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[42] Jack Dongarra,et al. Parallel tiled QR factorization for multicore architectures , 2008 .
[43] G. Stewart. Introduction to matrix computations , 1973 .
[44] Jack J. Dongarra,et al. Anatomy of a globally recursive embedded LINPACK benchmark , 2012, 2012 IEEE Conference on High Performance Extreme Computing.
[45] T. Chan,et al. Probabilistic Analysis of Gaussian Elimination Without Pivoting , 1997 .
[46] J. Demmel,et al. Implementing Communication-Optimal Parallel and Sequential QR Factorizations , 2008, 0809.2407.
[47] Victor Eijkhout,et al. Recursive approach in sparse matrix LU factorization , 2001, Sci. Program..
[48] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[49] Alan Edelman,et al. Large Dense Numerical Linear Algebra in 1993: the Parallel Computing Influence , 1993, Int. J. High Perform. Comput. Appl..
[50] Jack J. Dongarra,et al. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[51] Eduardo F. D'Azevedo,et al. Advances in full-wave modeling of radio frequency heated, multidimensional plasmas , 2002 .
[52] J. Dongarra,et al. Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures LAPACK Working Note # 209 , 2008 .