HiFlow3: a flexible and hardware-aware parallel finite element package

This paper details the concept and implementation of the parallel finite element software package HiFlow3 . HiFlow3 is driven by application requirements and aims at the solution of large-scale problems obtained by means of the finite element method for partial differential equations. By utilizing object-oriented concepts and the full capabilities of C++ the HiFlow3 project follows a modular and generic approach for building efficient parallel numerical solvers. It provides highly capable modules dealing with the mesh setup, finite element spaces, degrees of freedom, linear algebra routines, numerical solvers, and output data for visualization. Parallelism -- as the basis for high performance simulations on modern computing systems -- is introduced on two levels: coarse-grained parallelism by means of distributed grids and distributed data structures, and fine-grained parallelism by means of platform-optimized linear algebra back-ends. Modern numerical schemes in HiFlow3 are built on top of both levels of parallelism. This paper describes the project, its concept, and application scenarios in detail and outlines our hardware-aware cross-platform portable approach that benefits from various emerging technologies like GPU acceleration in a unified and user-friendly manner.

[1]  L. R. Scott,et al.  The Mathematical Theory of Finite Element Methods , 1994 .

[2]  Philippe G. Ciarlet,et al.  The finite element method for elliptic problems , 2002, Classics in applied mathematics.

[3]  Rolf Rannacher,et al.  Duality-based adaptivity in the hp-finite element method , 2003, J. Num. Math..

[4]  J. Guermond,et al.  Theory and practice of finite elements , 2004 .

[5]  C. Schwab P- and hp- finite element methods : theory and applications in solid and fluid mechanics , 1998 .

[6]  Mark Frederick Hoemmen,et al.  An Overview of Trilinos , 2003 .

[7]  Samuel Williams,et al.  Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.

[8]  Kunibert G. Siebert,et al.  Design of Adaptive Finite Element Software - The Finite Element Toolbox ALBERTA , 2005, Lecture Notes in Computational Science and Engineering.

[9]  Wolfgang Bangerth,et al.  Data structures and requirements for hp finite element software , 2009, TOMS.

[10]  Jan-Philipp Weiss,et al.  A multi-platform linear algebra toolbox for finite element solvers on heterogeneous clusters , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[11]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[12]  J. Oden,et al.  A Posteriori Error Estimation in Finite Element Analysis , 2000 .

[13]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[14]  Samuel Williams,et al.  Scientific Computing Kernels on the Cell Processor , 2007, International Journal of Parallel Programming.

[15]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[16]  Jan Mayer,et al.  A multilevel Crout ILU preconditioner with pivoting and row permutation , 2007, Numer. Linear Algebra Appl..

[17]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[18]  D. Braess Finite Elements: Finite Elements , 2007 .

[19]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[20]  Jan Mayer Symmetric Permutations for I-matrices to Delay and Avoid Small Pivots During Factorization , 2008, SIAM J. Sci. Comput..

[21]  Mathias J. Krause,et al.  Numerical Simulation of the Human Lung: A Two--scale Approach , 2011 .

[22]  I. Doležel,et al.  Higher-Order Finite Element Methods , 2003 .

[23]  Anders Logg,et al.  Unified framework for finite element assembly , 2009, Int. J. Comput. Sci. Eng..

[24]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[25]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[26]  Amy Henderson,et al.  The ParaView Guide: A Parallel Visualization Application , 2004 .

[27]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[28]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[29]  R. Ghanem,et al.  Stochastic Finite Elements: A Spectral Approach , 1990 .

[30]  Anders Logg,et al.  Efficient representation of computational meshes , 2009, Int. J. Comput. Sci. Eng..

[31]  Maciej Paszyński,et al.  Computing with hp-ADAPTIVE FINITE ELEMENTS: Volume II Frontiers: Three Dimensional Elliptic and Maxwell Problems with Applications , 2007 .