Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures

Matrices coming from elliptic Partial Differential Equations have been shown to have a low-rank property that can be efficiently exploited in multifrontal solvers to provide a substantial reduction of their complexity. Among the possible low-rank formats, the Block Low-Rank format (BLR) is easy to use in a general purpose multifrontal solver and its potential compared to standard (full-rank) solvers has been demonstrated. Recently, new variants have been introduced and it was proved that they can further reduce the complexity but their performance has never been analyzed. In this article, we present a multithreaded BLR factorization and analyze its efficiency and scalability in shared-memory multicore environments. We identify the challenges posed by the use of BLR approximations in multifrontal solvers and put forward several algorithmic variants of the BLR factorization that overcome these challenges by improving its efficiency and scalability. We illustrate the performance analysis of the BLR multifrontal factorization with numerical experiments on a large set of problems coming from a variety of real-life applications.

[1]  Jianlin Xia,et al.  Efficient Structured Multifrontal Factorization for General Large Sparse Matrices , 2013, SIAM J. Sci. Comput..

[2]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[3]  Eric Darve,et al.  A Fast and Memory Efficient Sparse Solver with Applications to Finite-Element Matrices , 2014 .

[4]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[5]  IAIN S. DUFF,et al.  Towards Stable Mixed Pivoting Strategies for the Sequential and Parallel Solution of Sparse Symmetric Indefinite Systems , 2007, SIAM J. Matrix Anal. Appl..

[6]  Mario Bebendorf,et al.  Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems , 2008 .

[7]  David E. Keyes,et al.  Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures , 2017, ISC.

[8]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[9]  Bora Uçar,et al.  Multifrontal Method , 2011, Encyclopedia of Parallel Computing.

[10]  Robert Schreiber,et al.  A New Implementation of Sparse Gaussian Elimination , 1982, TOMS.

[11]  Alfredo Buttari,et al.  On the Complexity of the Block Low-Rank Multifrontal Factorization , 2017, SIAM J. Sci. Comput..

[12]  HackbuschW. A sparse matrix arithmetic based on H-matrices. Part I , 1999 .

[13]  Wolfgang Hackbusch,et al.  A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.

[14]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[15]  Jean-Yves L'Excellent,et al.  A study of shared-memory parallelism in a multifrontal solver , 2014, Parallel Comput..

[16]  Patrick R. Amestoy,et al.  Large-scale 3D EM modeling with a Block Low-Rank multifrontal direct solver , 2017 .

[17]  A. Tarantola Inversion of seismic reflection data in the acoustic approximation , 1984 .

[18]  W. Hackbusch,et al.  Introduction to Hierarchical Matrices with Applications , 2003 .

[19]  Patrick R. Amestoy,et al.  Fast 3D frequency-domain full-waveform inversion with a parallel block low-rank multifrontal direct solver: Application to OBC data from the North Sea , 2016 .

[20]  Per-Gunnar Martinsson,et al.  A direct solver with O(N) complexity for integral equations on one-dimensional domains , 2011, 1105.5372.

[21]  Clément Weisbecker,et al.  Improving multifrontal solvers by means of algebraic Block Low-Rank representations. (Amélioration des solveurs multifrontaux à l'aide de représentations algébriques rang-faible par blocs) , 2013 .

[22]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[23]  Jianlin Xia,et al.  Fast algorithms for hierarchically semiseparable matrices , 2010, Numer. Linear Algebra Appl..

[24]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[25]  Jianlin Xia,et al.  Superfast Multifrontal Method for Large Structured Linear Systems of Equations , 2009, SIAM J. Matrix Anal. Appl..

[26]  S. Constable Ten years of marine CSEM for hydrocarbon exploration , 2010 .

[27]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[28]  Mario Bebendorf,et al.  Approximation of boundary element matrices , 2000, Numerische Mathematik.

[29]  Emmanuel Agullo,et al.  Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems , 2016, ACM Trans. Math. Softw..

[30]  Shivkumar Chandrasekaran,et al.  A Fast ULV Decomposition Solver for Hierarchically Semiseparable Representations , 2006, SIAM J. Matrix Anal. Appl..

[31]  Théo Mary,et al.  Block Low-Rank multifrontal solvers: complexity, performance, and scalability. (Solveurs multifrontaux exploitant des blocs de rang faible: complexité, performance et parallélisme) , 2017 .

[32]  Mohamed Wissam Sid Lakhdar,et al.  Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures. (Scalabilité des méthodes multifrontales pour la résolution de grands systèmes linéaires creux sur architectures hybrides à mémoire partagée et distribuée) , 2014 .

[33]  W. Hackbusch A Sparse Matrix Arithmetic Based on $\Cal H$-Matrices. Part I: Introduction to ${\Cal H}$-Matrices , 1999, Computing.

[34]  Eric Darve,et al.  Sparse Supernodal Solver Using Block Low-Rank Compression , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[35]  Eric Darve,et al.  A fast, memory efficient and robust sparse preconditioner based on a multifrontal approach with applications to finite‐element matrices , 2016 .

[36]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[37]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[38]  Samuel Williams,et al.  An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling , 2015, SIAM J. Sci. Comput..

[39]  David Goudin,et al.  Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[40]  Eric Darve,et al.  A fast block low-rank dense solver with applications to finite-element matrices , 2014, J. Comput. Phys..

[41]  Jean-Yves L'Excellent,et al.  Improving Multifrontal Methods by Means of Block Low-Rank Representations , 2015, SIAM J. Sci. Comput..

[42]  Eric Darve,et al.  Fast hierarchical solvers for sparse matrices using low-rank approximation , 2015 .