Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization
暂无分享,去创建一个
[1] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[2] Wolfgang Hackbusch,et al. A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.
[3] Sergej Rjasanow,et al. Adaptive Low-Rank Approximation of Collocation Matrices , 2003, Computing.
[4] Jean-Yves L'Excellent,et al. Improving Multifrontal Methods by Means of Block Low-Rank Representations , 2015, SIAM J. Sci. Comput..
[5] Yasuhito Takahashi,et al. Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters , 2014, J. Inf. Process..
[6] Alfredo Buttari,et al. On the Complexity of the Block Low-Rank Multifrontal Factorization , 2017, SIAM J. Sci. Comput..
[7] Emmanuel Agullo,et al. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model , 2017 .
[8] Stefan Kurz,et al. The adaptive cross-approximation technique for the 3D boundary-element method , 2002 .
[9] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[10] Nathan T. Hjelm,et al. Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs , 2019, 2019 IEEE International Conference on Cluster Computing (CLUSTER).
[11] Abhishek Gupta,et al. Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] George Bosilca,et al. Hierarchical DAG Scheduling for Hybrid Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[13] David E. Keyes,et al. Exploiting Data Sparsity for Large-Scale Matrix Computations , 2018, Euro-Par.
[14] Pavel Shamis,et al. Distributed Task-Based Runtime Systems - Current State and Micro-Benchmark Performance , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[15] Eric Darve,et al. An O(NlogN)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal O (N \log N)$$\end{document} Fast Direct Solver fo , 2013, Journal of Scientific Computing.
[16] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[17] W. Hackbusch,et al. On H2-Matrices , 2000 .
[18] Patrick R. Amestoy,et al. Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format , 2018 .
[19] Scott B. Baden,et al. The UPC++ PGAS library for Exascale Computing , 2017, PAW@SC.
[20] Thomas Hérault,et al. Dynamic task discovery in PaRSEC: a data-flow task-based runtime , 2017, ScalA@SC.
[21] Hatem Ltaief,et al. Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications , 2020, PASC.
[22] Patrick R. Amestoy,et al. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures , 2019, ACM Trans. Math. Softw..
[23] Yasuhito Takahashi,et al. Software framework for parallel BEM analyses with H-matrices , 2016 .
[24] Shivkumar Chandrasekaran,et al. A Fast ULV Decomposition Solver for Hierarchically Semiseparable Representations , 2006, SIAM J. Matrix Anal. Appl..
[25] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[26] Jack Dongarra,et al. Distributed-memory lattice H -matrix factorization , 2019, Int. J. High Perform. Comput. Appl..
[27] Laxmikant V. Kalé,et al. Multi-Level Load Balancing with an Integrated Runtime Approach , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[28] Jesús Labarta,et al. Integrating Blocking and Non-Blocking MPI Primitives with Task-Based Programming Models , 2019, Parallel Comput..