Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting
暂无分享,去创建一个
Jack J. Dongarra | Hatem Ltaief | Piotr Luszczek | Mathieu Faverge | J. Dongarra | P. Luszczek | H. Ltaief | Mathieu Faverge
[1] Jack J. Dongarra,et al. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[2] Danny C. Sorensen,et al. Analysis of Pairwise Pivoting in Gaussian Elimination , 1985, IEEE Transactions on Computers.
[3] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[4] Jack J. Dongarra,et al. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[5] Jack J. Dongarra,et al. High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures , 2013, TOMS.
[6] Jack J. Dongarra,et al. Scheduling two-sided transformations using tile algorithms on multicore architectures , 2010, Sci. Program..
[7] David Abramson,et al. The Virtual Laboratory: a toolset to enable distributed molecular modelling for drug design on the World‐Wide Grid , 2003, Concurr. Comput. Pract. Exp..
[8] Erik Elmroth,et al. Design and Evaluation of Parallel Block Algorithms: LU Factorization on an IBM 3090 VF/600J , 1991, PPSC.
[9] Robert A. van de Geijn,et al. Managing the complexity of lookahead for LU factorization with pivoting , 2010, SPAA '10.
[10] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[11] Jack J. Dongarra,et al. A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[12] Jack J. Dongarra,et al. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction , 2011, PPAM.
[13] Jack J. Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..
[14] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[15] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[16] Dror Irony,et al. Communication-Efficient Parallel Dense LU Using a3-Dimnsional Approach , 2001, PPSC.
[17] Håkan Sundell,et al. Efficient and Practical Non-Blocking Data Structures , 2004 .
[18] E. Dijkstra. On the Role of Scientific Thought , 1982 .
[19] Julien Langou,et al. Parallel tiled QR factorization for multicore architectures , 2007, Concurr. Comput. Pract. Exp..
[20] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[21] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[22] Emmanuel Agullo,et al. LU factorization for accelerator-based systems , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).
[23] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[24] E. L. Yip,et al. FORTRAN subroutines for out-of-core solutions of large complex linear systems , 1979 .
[25] Chris Reade,et al. Elements of functional programming , 1989, International computer science series.
[26] Jack Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010 .
[27] Emmanuel Agullo,et al. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[28] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[29] Jack J. Dongarra,et al. Evaluating Block Algorithm Variants in LAPACK , 1989, PPSC.
[30] Ken Kennedy,et al. Automatic blocking of QR and LU factorizations for locality , 2004, MSP '04.
[31] Jack Dongarra,et al. LAPACK Working Note 18: Implementation Guide for LAPACK , 1990 .
[32] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[33] Jack J. Dongarra,et al. EZTrace: A Generic Framework for Performance Analysis , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[34] Jerzy Wasniewski,et al. Recursive Version of LU Decomposition , 2000, NAA.
[35] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[36] Victor Eijkhout,et al. Recursive approach in sparse matrix LU factorization , 2001, Sci. Program..
[37] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[38] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[39] DongarraJack,et al. Parallel tiled QR factorization for multicore architectures , 2008 .
[40] Jack J. Dongarra,et al. Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[41] Jack J. Dongarra,et al. Anatomy of a globally recursive embedded LINPACK benchmark , 2012, 2012 IEEE Conference on High Performance Extreme Computing.
[42] Edsger W. Dijkstra,et al. Selected Writings on Computing: A personal Perspective , 1982, Texts and Monographs in Computer Science.
[43] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.