论文信息 - Distributed-memory H-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

Distributed-memory H-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $\Omega(P^2)$ scheduling procedure used in previous work, where $P$ is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among $P$ processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $O\Big(\frac{N \log N}{P} + \alpha \log P + \beta \log^2 P \Big)$ for $\mathcal{H}$-matrices under weak admissibility condition, where $N$ is the matrix size, $\alpha$ denotes the latency, and $\beta$ denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.

[1] V. Rokhlin. Rapid solution of integral equations of classical potential theory , 1985 .

[2] Lexing Ying,et al. Distributed-memory hierarchical interpolative factorization , 2016, ArXiv.

[3] Ronald Kriemann,et al. Parallel -Matrix Arithmetics on Shared Memory Systems , 2005, Computing.

[4] Lexing Ying,et al. Fast construction of hierarchical matrix representation from matrix-vector multiplication , 2009, J. Comput. Phys..

[5] Eric Darve,et al. Parallelization of the inverse fast multipole method with an application to boundary element method , 2020, Comput. Phys. Commun..

[6] J. L. Hennessy,et al. A parallel adaptive fast multipole method , 1993, Supercomputing '93.

[7] M. S. Warren,et al. A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.

[8] Jianlin Xia,et al. Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures , 2013, SIAM J. Sci. Comput..

[9] Michael W. Mahoney,et al. BLOCK BASIS FACTORIZATION FOR SCALABLE KERNEL EVALUATION∗ , 2019 .

[10] Pieter Ghysels,et al. A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization , 2015, ACM Trans. Math. Softw..

[11] Michael O'Neil,et al. An algorithm for the rapid evaluation of special function transforms , 2010 .