A massively parallel and memory-efficient FEM toolbox with a hybrid total FETI solver with accelerator support

In this article, we present the ExaScale PaRallel finite element tearing and interconnecting SOlver (ESPRESO) finite element method (FEM) library, which includes an FEM toolbox with interfaces to professional and open-source simulation tools, and a massively parallel hybrid total finite element tearing and interconnecting (HTFETI) solver which can fully utilize the Oak Ridge Leadership Computing Facility Titan supercomputer and achieve superlinear scaling. This article presents several new techniques for finite element tearing and interconnecting (FETI) solvers designed for efficient utilization of supercomputers with a focus on (i) performance—we present a fivefold reduction of solver runtime for the Laplace equation by redesigning the FETI solver and offloading the key workload to the accelerator. We compare Intel Xeon Phi 7120p and Tesla K80 and P100 accelerators to Intel Xeon E5-2680v3 and Xeon Phi 7210 central processing units; and (ii) memory efficiency—we present two techniques which increase the efficiency of the HTFETI solver 1.8 times and push the limits of the largest possible problem ESPRESO that can solve from 124 to 223 billion unknowns for problems with unstructured meshes. Finally, we show that by dynamically tuning hardware parameters, we can reduce energy consumption by up to 33%.

[1]  Jakub Šístek,et al.  Parallel iterative solution of the incompressible Navier–Stokes equations with application to rotating wings , 2015 .

[2]  Olaf Schenk,et al.  Fast Methods for Computing Selected Elements of the Green's Function in Massively Parallel Nanoelectronic Device Simulations , 2013, Euro-Par.

[3]  Y. Maday,et al.  Optimal convergence properties of the FETI domain decomposition method , 2007 .

[4]  Axel Klawonn,et al.  FE 2 TI: Computational Scale Bridging for Dual-Phase Steels. , 2015 .

[5]  Aleksandar Jemcov,et al.  OpenFOAM: A C++ Library for Complex Physics Simulations , 2007 .

[6]  Axel Klawonn,et al.  A Highly Scalable Implementation of Inexact Nonlinear FETI-DP Without Sparse Direct Solvers , 2015, ENUMATH.

[7]  Rudolf A. Römer,et al.  On large‐scale diagonalization techniques for the Anderson model of localization , 2005, SIAM J. Sci. Comput..

[8]  Wim Vanroose,et al.  Efficient Implementation of Total FETI Solver for Graphic Processing Units Using Schur Complement , 2015, HPCSE.

[9]  Andres More,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[10]  A. Klawonn,et al.  Highly scalable parallel domain decomposition methods with an application to biomechanics , 2010 .

[11]  James Reinders,et al.  High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches , 2014 .

[12]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[13]  Patrick R. Amestoy,et al.  An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..

[14]  Z. Dostál,et al.  Total FETI—an easier implementable variant of the FETI method for numerical solution of elliptic PDE , 2006 .

[15]  Ondrej Meca,et al.  Massively Parallel Hybrid Total FETI (HTFETI) Solver , 2016, PASC.

[16]  Venkatesh Kannan,et al.  The READEX formalism for automatic tuning for energy efficiency , 2016, Computing.

[17]  Venkatesh Kannan,et al.  Evaluation of the HPC applications dynamic behavior in terms of energy consumption , 2017 .

[18]  Santiago Badia,et al.  A Highly Scalable Parallel Implementation of Balancing Domain Decomposition by Constraints , 2014, SIAM J. Sci. Comput..

[19]  Timothy A. Davis,et al.  Accelerating sparse cholesky factorization on GPUs , 2014, IA3 '14.

[20]  P. Gosselet,et al.  Non-overlapping domain decomposition methods in structural mechanics , 2006, 1208.4209.

[21]  Axel Klawonn,et al.  FE2TI: Computational Scale Bridging for Dual-Phase Steels , 2015, PARCO.

[22]  William J. Schroeder,et al.  The Visualization Toolkit , 2005, The Visualization Handbook.

[23]  Olaf Schenk,et al.  Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization , 2007, Comput. Optim. Appl..