Implementation of the efficient communication layer for the highly parallel total FETI and hybrid total FETI solvers

Implementation, performance, and scalability results of communication layer for Total FETI and Hybrid Total FETI solver.In HTFETI several neighboring subdomains aggregated into clusters. This reduces the size of coarse problem and improves scalability.Optimization of nearest neighbor communication - global gluing matrix.Implementation of communication hiding and avoiding techniques inside the communication layerBenchmarks - elastic 3D cube up to 1.6 billion DOF and realistic car engine benchmark.Large test executed on Total FETI to see the real potential of communication layer on smaller clusters. This paper describes the implementation, performance, and scalability of our communication layer developed for Total FETI (TFETI) and Hybrid Total FETI (HTFETI) solvers. HTFETI is based on our variant of the Finite Element Tearing and Interconnecting (FETI) type domain decomposition method. In this approach a small number of neighboring subdomains is aggregated into clusters, which results in a smaller coarse problem. To solve the original problem TFETI method is applied twice: to the clusters and then to the subdomains in each cluster.The current implementation of the solver is focused on the performance optimization of the main CG iteration loop, including: implementation of communication hiding and avoiding techniques for global communications; optimization of the nearest neighbor communication - multiplication with a global gluing matrix; and optimization of the parallel CG algorithm to iterate over local Lagrange multipliers only.The performance is demonstrated on a linear elasticity 3D cube and real world benchmarks.