Task-based Parallel Programming for Scalable Matrix Product Algorithms
暂无分享,去创建一个
[1] Jeffrey S. Vetter,et al. IRIS: A Portable Runtime System Exploiting Multiple Heterogeneous Programming Systems , 2021, 2021 IEEE High Performance Extreme Computing Conference (HPEC).
[2] Emmanuel Jeannot,et al. Using Dynamic Broadcasts to Improve Task-Based Runtime Performances , 2020, Euro-Par.
[3] Tsung-Wei Huang,et al. Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System , 2020, IEEE Transactions on Parallel and Distributed Systems.
[4] Jack Dongarra,et al. Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC , 2019, 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA).
[5] Xiaoye S. Li,et al. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems , 2019, J. Parallel Distributed Comput..
[6] Emmanuel Agullo,et al. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model , 2017 .
[7] Eduard Ayguadé,et al. Improving the Integration of Task Nesting and Dependencies in OpenMP , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[8] Martin D. Schatz,et al. Parallel Matrix Multiplication: A Systematic Journey , 2016, SIAM J. Sci. Comput..
[9] George Bosilca,et al. Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver , 2016, Euro-Par Workshops.
[10] Emmanuel Agullo,et al. Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems , 2016, ACM Trans. Math. Softw..
[11] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[12] Robert A. van de Geijn,et al. Scheduling algorithms‐by‐blocks on small clusters , 2013, Concurr. Comput. Pract. Exp..
[13] Katherine A. Yelick,et al. Communication avoiding and overlapping for numerical linear algebra , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[15] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[16] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[17] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[18] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[19] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[20] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[21] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[22] Robert A. van de Geijn,et al. Using PLAPACK - parallel linear algebra package , 1997 .
[23] James Demmel,et al. ScaLAPACK: A Linear Algebra Library for Message-Passing Computers , 1997, PPSC.
[24] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[25] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[26] Ramesh C. Agarwal,et al. A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..
[27] Asim YarKhan,et al. Dynamic Task Execution on Shared and Distributed Memory Architectures , 2012 .
[28] Padma Raghavan,et al. Parallel Processing for Scientific Computing , 2006, Software, Environments, Tools.
[29] Patrick Amestoy,et al. A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..
[30] Cleve Ashcraft,et al. The Fan-Both Family of Column-Based Distributed Cholesky Factorization Algorithms , 1993 .
[31] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .