Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing
暂无分享,去创建一个
Jesús Labarta | David E. Keyes | Hatem Ltaief | Xavier Martorell | Rosa M. Badia | Rabab Alomairy | Guillermo Miranda | D. Keyes | Jesús Labarta | X. Martorell | H. Ltaief | Rabab Alomairy | Guillermo Miranda
[1] Jack J. Dongarra,et al. Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures , 2010, IEEE Transactions on Parallel and Distributed Systems.
[2] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures , 2009, IWOMP.
[3] Karl-Filip Faxén,et al. Wool-A work stealing library , 2008, CARN.
[4] Emmanuel Agullo,et al. Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures , 2010, VECPAR.
[5] Emmanuel Jeannot,et al. Symbolic mapping and allocation for the Cholesky factorization on NUMA machines , 2013, Int. J. High Perform. Comput. Appl..
[6] Jack J. Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..
[7] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[8] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks , 2009, International Journal of Parallel Programming.
[9] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[10] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[11] Quan Chen,et al. LAWS: locality-aware work-stealing for multi-socket multi-core architectures , 2014, ICS '14.
[12] Vladimir Vlassov,et al. Locality-Aware Task Scheduling and Data Distribution on NUMA Systems , 2013, IWOMP.
[13] David E. Keyes,et al. Exaflop/s: The why and the how , 2011 .
[14] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[15] J. Dongarra,et al. Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures LAPACK Working Note # 209 , 2008 .
[16] Robert A. van de Geijn,et al. The libflame Library for Dense Matrix Computations , 2009, Computing in Science & Engineering.
[17] Jack J. Dongarra,et al. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008, IEEE Transactions on Parallel and Distributed Systems.
[18] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[19] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[20] Lars Karlsson,et al. Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures , 2011, Parallel Comput..
[21] Karine Heydemann,et al. Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages , 2014, ACM Trans. Archit. Code Optim..
[22] Samuel Thibault,et al. Structuring the execution of OpenMP applications for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[23] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[24] David E. Keyes,et al. Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Vivek Sarkar,et al. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.
[26] Nicholas J. Higham,et al. Accuracy and stability of numerical algorithms, Second Edition , 2002 .
[27] David E. Keyes,et al. Communication Complexity of the Fast Multipole Method and its Algebraic Variants , 2014, Supercomput. Front. Innov..
[28] Robert A. van de Geijn,et al. Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[29] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.