Programming parallel dense matrix factorizations with look-ahead and OpenMP
暂无分享,去创建一个
Adrián Castelló | Enrique S. Quintana-Ortí | Francisco D. Igual | Sandra Catalán | Rafael Rodríguez-Sánchez | Francisco D. Igual | E. S. Quintana-Ortí | F. Igual | Sandra Catalán | Rafael Rodríguez-Sánchez | Adrián Castelló | E. Quintana‐Ortí
[1] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[2] Robert A. van de Geijn,et al. Updating an LU Factorization with Pivoting , 2008, TOMS.
[3] Pavan Balaji,et al. A Review of Lightweight Thread Approaches for High Performance Computing , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[4] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[5] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[6] Devang Shah,et al. Implementing Lightweight Threads , 1992, USENIX Summer.
[7] Gene H. Golub,et al. Matrix computations , 1983 .
[8] Enrique S. Quintana-Ortí,et al. A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting , 2016, IEEE Access.
[9] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[10] Alex Brooks,et al. Argobots: A Lightweight Low-Level Threading and Tasking Framework , 2018, IEEE Transactions on Parallel and Distributed Systems.
[11] P. Strazdins. A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization , 1998 .
[12] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[13] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[14] Pavan Balaji,et al. GLT: A Unified API for Lightweight Thread Libraries , 2017, Euro-Par.
[15] Robert A. van de Geijn,et al. Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.
[16] Robert A. van de Geijn,et al. Anatomy of High-Performance Many-Threaded Matrix Multiplication , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[17] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.
[18] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[19] Bruno Lang,et al. Efficient parallel reduction to bidiagonal form , 1999, Parallel Comput..
[20] Jesús Labarta,et al. Parallelizing dense and banded linear algebra libraries using SMPSs , 2009, Concurr. Comput. Pract. Exp..
[21] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[22] Pavan Balaji,et al. GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations , 2017, 2017 46th International Conference on Parallel Processing (ICPP).
[23] G Van ZeeField,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015 .
[24] Enrique S. Quintana-Ortí,et al. Two-Sided Reduction to Compact Band Forms with Look-Ahead , 2017, ArXiv.
[25] Rafael Mayo,et al. Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors , 2015, Cluster Computing.
[26] Tze Meng Low,et al. The BLIS Framework , 2016 .
[27] Adrián Castelló,et al. On the adequacy of lightweight thread approaches for high-level parallel programming models , 2018, Future Gener. Comput. Syst..
[28] Christian H. Bischof,et al. Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.
[29] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.